In the previous posting I said creating a new storage mechanism for TiddlyWeb was a simple matter of creating a new module that supported an interface. It occurs to me that I can kill a few birds by writing about the process of creating such a module. One of the birds is explaining some of the facets of TiddlyWeb’s architecture. Another is explaining why, despite Damien Katz getting all up in people’s business and calling bullshit, REST is teh awesome and I want it.
What we’ll do here, just to be a bit perverse, is a create a store for TiddlyWeb that uses another TiddlyWeb as the place where resources are stored. In the process we should see why TiddlyWeb’s API being “RESTful” is useful and also expose a few bugs that need to be fixed to have better behavior.
I started this a while ago and called it tiddlywebweb, so we’ll carry on with that.
In TiddlyWeb a store is a module with a known name, containing a class with the name Store that is a subclass of StorageInterface. The interface is defined in tiddlyweb/stores/init.py. The default store for TiddlyWeb is called text and is in tiddlyweb/stores/text.py. text uses the filesystem and easy to read (by humans) text files for storing data. It’s not terribly speedy, but it is easy to grok.
In usual use a particular implementation of a StorageInterface is not directly instantiated. Instead the calling code creates a Store object of the class defined in tiddlyweb/store.py, choosing the type of store by name:
from tiddlyweb.store import Store
store = Store('text')
Methods are then called on the store object to access data:
from tiddlyweb.tiddler import Tiddler
tiddler = Tiddler('MyTiddler', bag='foo')
store.get(tiddler)
print tiddler.text
(The unique id of a tiddler is it’s title and the name of the bag in which it lives, so we need both in order to be able to retrieve a tiddler from the store.)
When a Store object is being created the system first looks in the tiddlyweb.stores package for a module with the given name. If it finds that, it is imported and the Store class within is used. If that import does not happen, then the system tries to import the name directly (searching sys.path). Our new tiddlywebweb code is located in tiddlywebweb/tiddlywebstore.py so we could load it like this:
from tiddlyweb.store import Store
store = Store('tiddlywebweb.tiddlywebstore')
Except that in most cases we would not. Each TiddlyWeb instance is assumed to have just one store (for now) so the storage system can be defined in the system configuration. The default system configuration lives in tiddlyweb/config.py and can be overridden by a Python file called tiddlywebconfig.py kept in the working directory of the TiddlyWeb server. The config dict in tiddlywebconfig.py is merged over the top of the config dict in tiddlyweb/config.py.
The server_store key in that dict holds the name of the store that is to be used, and a dict of any configuration information that it needs (path information, username and password handling, etc.) For the default text store it looks like this:
'server_store': ['text', {'store_root': 'store'}],
text is the name of the store. store_root is the path to the directory where data is to be stored.
To use tiddlywebweb as the store for our server A, we know we need the the base URL of the other TiddlyWeb server (server B) so to start out our config entry will look a bit like this (assuming the other server is running on localhost:8000):
config = {
'server_store': ['tiddlywebweb.tiddlywebstore',
{'server_base': 'http://localhost:8000'} ],
}
That will go in the tiddlywebconfig.py of server A (more detail on that in a bit). Server B needs no particular modification (for the purposes of this exercise we’re not going to worry about access control, maybe next time), it just needs to be running.
Okay, so now we know how to use a different store, but what does a store do? As said before a Store needs to implement (some of) the StorageInterface. This is a collection of methods that get, put, delete and list a variety of entities used by TiddlyWeb. Here’s the complete list:
recipe_get(self, recipe):
recipe_put(self, recipe):
bag_get(self, bag):
bag_put(self, recipe):
tiddler_delete(self, tiddler):
tiddler_get(self, tiddler):
tiddler_put(self, tiddler):
user_get(self, user):
user_put(self, user):
list_recipes(self):
list_bags(self):
list_tiddler_revisions(self, tiddler):
tiddler_written(self, tiddler):
search(self, search_query):
Some caveats:
- Support for deleting recipes and bags is planned but not yet supported.
- Support for users is optional (for example the GoogleAppEngine version of TiddlyWeb uses Google Users, so doesn’t need to store them).
- Support for listing tiddler revisions is optional. If a store doesn’t support revisions, then a tiddler is always revision one.
-
tiddler_written is by default a pass. It is called by tiddler_put. If overriden it can be used to update an index if one is used by search.
Alright, let’s think about this a minute. If our server A is using server B for storage, then when a user asks server A “GET /recipes/foo” what needs to happen is that server A asks server B “GET /recipes/foo”. And when a user tells server A “PUT /recipes/foo”, server A needs to tell server B “PUT /recipes/foo”. Similar things for a bag or a tiddler or collections thereof. So what we need to do is proxy web requests to server A through to server B, right?
Wrong. In order for TiddlyWeb to be able to support multiple storage types it has to have fairly disciplined separation of concerns amongst all the bits of code. When a user asks server A for a recipe the request is processed something like this:
- The selector map dispatches to the right piece of code (
tiddlyweb.web.recipe:get) based on urls.map.
- That piece of code uses the name provided by the URL to instantiate an empty
Recipe object, recipe.
- The configured store is then asked
store.get(recipe) to populate the recipe.
- Based on content negotiation, that recipe is then serialized to a particular string and sent out in the response.
Step 2 is required in order for step 3 to be most flexible. And Step 3 has no clue what the configured store is, just that it calls store.get().
So proxying is right out. Which means our tiddlywebweb store needs to be an HTTP client that constructs proper URLs and makes requests to a remote server. We’ll only pass content that is application/json to be simple (and because it is the most robust of the default TiddlyWeb serializations). And to keep things simple we’ll just get the bare bones in. Later we’ll add the fancy.
We know about recipe_get. Let’s describe what it needs to do:
- Accept a recipe.
- Use its name to construct a URL.
- Make a GET request to that URL with an
Accept header of application/json.
- Transform a successful response from JSON into a proper Recipe object.
- Get the object back up the stack.
recipe_put does something quite similar:
- Accept a recipe.
- Transform the recipe into JSON structure.
- Use the name of the recipe to construct a URL.
- Make a PUT request to that URL with a
Content-Type header of application/json and a body of the JSON.
- Promulgate success up the stack.
The rest of the methods are in some very fundamental ways basically the same (this is why mnot is basically right for many cases) so we’ll not bother to go into too much detail.
In tiddlywebweb.tiddlywebstore recipe_get looks like this:
def recipe_get(self, recipe):
url = self.recipe_url % urllib.quote(recipe.name)
self.doit(url, recipe, self._any_get, NoRecipeError)
Wait, what? That gets the first step (url construction) but what’s going on with the rest of it. Well, like I said, all the methods are basically the same so we can abstract that out. doit() looks like this:
def doit(self, url, object, method, exception):
try:
method(url, object)
except TiddlyWebWebError, e:
raise exception, e
Which says “try to do method method, if it works, we’re golden, otherwise raised the exception named by exception”. doit() wraps either _any_get() or _any_put(). Here’s _any_get():
def _any_get(self, url, target_object):
response, content = self._request('GET', url)
if self._is_success(response):
self.serializer.object = target_object
self.serializer.from_string(content)
else:
raise TiddlyWebWebError, '%s: %s' % (response['status'], content)
- Make the request.
- If it was good, transform the JSON response to the object form.
- If it was bad, throw an error.
So I’ve done that. You can see the store module and some extra bits for making it go.
This covers the basics:
- You can lists tiddlers, bags, recipes.
- GET representations of each.
- PUT some tiddlers.
So the basic concept of having the content hosted on a remote server is doable. But there are problems:
- Listing a bag or recipe with a lot of tiddlers in it results in a lot of requests to server B.
- We’ve got not delete yet.
- There’s no auth handling.
- It’s kind of slow.
Some other time, sooner than later if there is interest, I’ll improve the code to show how some solid use of good HTTP principles will make the code better and make the system operate more efficiently.