Peer Pressure

Stuff to look at about looking at stuff. From Chris Dent. What?

Archive

Nov
5th
Thu
permalink

Indexed TiddlyWeb Filters

One of the core features of TiddlyWeb is its ability to use filters to constrain the tiddlers that are selected from any collection of tiddlers (bag, recipe, search results, etc.). In the early design discussions that led to the creation of TiddlyWeb filters were conceived as the mechanism a recipe would use to choose only some tiddlers from a bag. Bags are containers for tiddlers that have been grouped together for some reason. Recipes are lists of bags that lead to the creation of some useful set of tiddlers. When using TiddlyWeb and TiddlyWiki together, a recipe can create a particular application or vertical of TiddlyWiki. In that context one would use a filter to select some tiddler tagged “systemConfig” from one bag, others tagged “faq” from another, and others with modifier “cdent” from another.

When TiddlyWeb is used generally as a data store, filters are just as useful. When requesting tiddlers from any bag you can select and sort by attributes, and limit the number of tiddlers. Application developers can also make new filters as plugins (see mselect for an example).

This is all quite grand and useful but recent explorations by Mike Mahemoff while developing Scrumptious have revealed some (fairly expected) problems. Imagine a bag called “comments” containing some 10,000 or more tiddlers which are comments on URLs. Now imagine you’d like to get those tiddlers which have the field ‘url’ set to http://cdent.tumblr.com/.

The naive way to do this is to look at each one of those tiddlers, one at a time, and say “Hello tiddler, have you got your url field set to http://cdent.tumblr.com/? Oh you do? Well then I’ll have you, thanks!” This is time consuming and resource intensive.

It is also the way TiddlyWeb does filters. It’s like this for a few different reasons:

  • The original design imagined many bags, not few bags with large numbers of tiddlers.
  • It preserves the strict separation between the filter system and the storage system, meaning that the storage system can be simple and very adaptable: any filter can work with any store.
  • It makes the filter system fairly transparent: There’s no magic going on; a filter works by looking at tiddlers and making a decision.
  • It makes the filter system easy to extend: The contract between a filter and the rest of the system is “look at some tiddlers, return some tiddlers”. What the filter does when looking is arbitrary.

The 0.9.74 release of TiddlyWeb includes support for querying an index when doing select style filters. The support only kicks in in special circumstances (explained below) but when it does it can speed some filters up immensely. In a test (using profile/list_tiddlers.py) of 10000 tiddlers a filter that took 13.96 seconds without an index took .30 seconds with an index.

That’s great news. Here’s the bad news: In at least this initial implementation the prerequisites for the indexing system to be used (and be useful) are quite complex. Here’s the list:

  • The filter being performed must be a select filter (sort and limit do not).
  • If there are multiple filters being performed on the collection of tiddlers, the select filter must be first in the stack and only that one filter will use the index.
  • The collection of tiddlers must be what’s called a “natural” bag. That is, the thing being filtered is a bag that exists in the store and the entire contents of the bag is what’s desired to be filtered.
  • That bag should be skinny, meaning when it was loaded from the store its tiddlers contents was not determined. If you are processing recipes or working from URLs this is handled for you by the code. It’s only a concern if you are writing your own handlers.
  • tiddlyweb.config[‘indexer’] is set to a string which is the name of a module which provides an index_query(environ, **kwargs) callable that returns tiddlers which have been loaded from the store. tiddlywebplugins.whoosher has been updated to provide this and the next item.
  • Something must provide an index for index_query to query. That index needs to be kept up to date as tiddlers are changed. tiddlywebplugins.whoosher has this functionality. The sql and mappingsql have the guts to make the functionality possible, but have not yet been extended with an index_query method.

The TiddlyWeb at http://tiddlyweb.peermore.com/ has been updated to use the filter indexes. The relevant changes to the tiddlywebconfig.py are:

  • Add tiddlywebplugins.whoosher to twanager_plugins.
  • Add tiddlywebplugins.whoosher to system_plugins.
  • Set indexer to tiddlywebplugins.whoosher.

Then twanager wreindex is run to build the initial Whoosh index.

You can get some sense of the effects of the index by comparing the following to URLs (this is not an exact test, but gives the sense of things):

The docs bag only has a couple hundred tiddlers in it and memcached is involved, so the effect is not huge, but if you imagine bags orders of magnitude bigger…

Astute observers will note that what’s going on here is not particularly innovative: It’s simply the addition of an index to a query system. One can imagine future improvements a la SQL query optimization, wherein the order of the filters are adjusted to allow most effective use of the index, and the index is used for more queries than just those against “natural” bags. Constant evolution, constantly building on the shoulders of that which has come before.

For more details, a browse of the code will be instructive. tiddlyweb.control:filter_tiddlers_from_bag is a good entry point.

Comments (View)
Oct
27th
Tue
permalink

TiddlyWeb Value Proposition

A common question associated with TiddlyWeb is “What can I do with it?” or “Why would I want to use it instead of Django or something?” The answer is not simple.

Because TiddlyWeb has tools for storing data and presenting that data in various forms over HTTP, it has some of the core features of a web application framework. However, since it (intentionally) is also missing some of the core features of a web application framework, a fair piece of cognitive dissonance can happen when the tool is approached as a web application framework.

So it seems best to stay away from saying that TiddlyWeb is a web application framework, so as to manage expectations. The question remains, “What is it?”

One pathway to answering that question is to create tools with it and see what the commonalities are between those tools. A derivative of this approach is to create a killer app with TiddlyWeb: a something which demonstrates all the salient features and how to use them, both as a user and as a developer.

This approach seems to falter: The concepts in TiddlyWeb are sufficiently strange (I won’t say unique, because they aren’t that, they are common concepts, with their own spin) that in order to create a tool that uses them well you first need to either understand them in the abstract, or see some examples. Catch 22, especially if you are an examples oriented person.

FND and I tend to converse on this topic about daily, trying to come up with different strategies for answering it. Today I suggested that instead of creating a new application, from scratch, that is cool and interesting because it uses TiddlyWeb, it might be better to find some application that could be made better through the addition or inclusion of TiddlyWeb.

It turns out this process has already started and by analyzing that process we might understand TiddlyWeb better.

The process has started with TiddlyWiki. TiddlyWiki is made better for a particular set of use cases by taking advantage of TiddlyWeb features. The fundamental experience of TiddlyWiki does not change: you’ve still got a wiki in an HTML file, with these things called tiddlers, some of which are special plugins that change behaviors.

What TiddlyWeb adds when used with TiddlyWiki is:

  • Centralization of content storage, storing at the level of the tiddler, with revisions, protection against editing conflicts and access control to read, write, create and delete; accessible from multiple locations on the network.
  • A (hopefully) straightforward system of managing access control of tiddlers and making other kinds of groupings of tiddlers through the use of bags. Bags are collections of uniquely named Tiddlers.
  • A system for composing functional collections of tiddlers selected from multiple bags using recipes. Recipes are list of bags paired with filters. Filters are rules which limit the tiddlers pulled from a bag when processing the recipe.
  • Tools for viewing and manipulating tiddlers in different forms than those used in the wiki (e.g. JSON, Atom).

These four things together provide a multi-user environment for TiddlyWiki that allows multiple custom views of the same or similar content, depending on some piece of context. Different recipes can provide a different view on the same stuff, providing a different look and feel, different security handling, etc.

TiddlyWeb has many other features, but the above four are probably the key aspects which define its special sauce. When I evaluate a nascent TiddlyWeb tool for its tiddly-fitness the evaluation is done based on the extent to which the system takes advantage of and understands the above features.

These ideas should allow us to create some mental heuristics to use when approaching a TiddlyWeb project. If we have an existing project or use case that we intuit will be helped by using TiddlyWeb we can ask ourselves: 1) What aspects of this project will benefit from bags, recipes and filters? 2) How can we take advantage of different representations?

If, for some reason, we are planning to make a project from scratch, and have decided to use TiddlyWeb prior to really evaluating our use cases, then we can ask ourselves: 1) How can we best structure our data and UI to use recipes, bags and filters to provide flexible user experiences? 2) What representations can we present to allow other people to provide flexible user experiences?

Comments (View)
Oct
18th
Sun
permalink

Python Namespace Packages for TiddlyWeb

In Friday’s TiddlyWeb Dev/Deploy Workshop posting I said that mature plugins need to be packaged so they can be indexed by PyPI and installed via easy_install and pip. At base this is relatively straightforward, put some stuff in a directory, make a setup.py, register the package, package up a source distribution, tell PyPI about it.

This is okay if you have a good name for your package, but what about packages, like plugins for TiddlyWeb, which have names which are only meaningful in their use context? You can’t just use the name of the module, otherwise you end up with namespace collisions, trouble finding stuff, and associated beasts of chaos.

Thankfully distutils, setuptools, pip, etc conspire to support a notion called namespace_packages which can solve this issue. Unfortunately using the feature is not exceptionally well documented. I found getting started a bit frustrating, until I sort of cracked the nut. Here’s some info for reference.

First some prerequisites:

  1. You need setuptools
  2. You need a username and password on PyPi.
  3. Some understanding of how to make a python package distribution. Here’s one tutorial: Use setup.py to Deploy Your Python App with Style.
  4. Some understanding of Python packages.

That tutorial includes a section about using namespace_packages under the heading “Multiple Distributions, One (Virtual) Package”. That “Virtual” is key: if you wish to use namespace packages there much be no real distribution which occupies the namespace. In the example of tiddlywebplugins, you can have tiddlywebplugins.static and tiddlywebplugins.utils distributions which are members of the tiddlywebplugins namespace, but you must not have a tiddlywebplugins distribution. If you do the packages which are supposed to occupy the virtual namespace will not be found. The upshot of this is that if you already have a package out there using a name that you want to use as a namespace, you will need to rename the existing package (this is why the old tiddlywebplugins package is now tiddlywebplugins.utils

Packaging a Plugin

So say you have a TiddlyWeb plugin called foobar.py sitting in a directory somewhere. You’ve determined that it is a happy little plugin and the world would benefit if it could be installed easily. You’ve heard of the tiddlywebplugins namespace and you’d like to join the party. Here’s what you do.

  • In that directory make a tiddlywebplugins directory.
  • Edit tiddlywebplugins/__init__.py to include just this line: __import__("pkg_resources").declare_namespace(__name__)
  • Move foobar.py into the tiddlywebplugins directory.
  • Create a setup.py (in the original directory) that includes at least:

     from setuptools import setup, find_packages

     setup(
         version = '0.1',
         namespace_packages = ['tiddlywebplugins'],
         name = 'tiddlywebplugins.foobar',
         description = 'A TiddlyWeb plugin for foobaring the fritz.',
         install_requires = ['setuptools', 'tiddlyweb'],
         )
  • Do not import from the tiddlywebplugins package in setup.py. This will make installs struggle or fail later.
  • Learn enough about distribution to register the package and upload it. The links above point to enough documentation to figure that part out. If you can’t be bothered to read that documentation then you shouldn’t be distributing packages. We wouldn’t want you aoling all over PyPi.
  • When you want to use your foobar plugin in a TiddlyWeb instance or application refer to it as tiddlywebplugins.foobar.
Comments (View)
Oct
16th
Fri
permalink

TiddlyWeb Dev/Deploy Workshop

Yesterday I gathered with the Osmosoft gang to discuss developing and deploying applications that are built on top of TiddlyWeb. Since I mostly develop TiddlyWeb itself and not things built upon it, I tend to be fairly distanced from the issues so getting together and having a chat and messing about with some code was a good thing.

Mike has written up his summary of the things to do next. I took some notes as well, which I’ll attempt to summarize here.

To get a sense of the issues we went around the room and people reported on how they’ve been managing development and deployment and what’s working well and what’s not. There were some but not many common threads.

One common issue is that most of the people have a tendency to do a code a bit, reload a web page cycle of development and test. When using twanager server this does not work well as the server needs to be restarted with each code change. Restarting CherryPy’s wsgiserver is less easy than some desire because it can be some time before the main thread will come round to hearing an interrupt signal. The initial solution to this problem, the reloader plugin, helps but gets a bit weird when syntax errors exist in the watched Python files (which happens a lot if you have a esc:w trigger finger): The entire server will crash out and you have to start the process again by hand. This mostly makes sense from a logical standpoint but is not that helpful.

One proposed solution to this problem is to host TiddlyWeb under a simple CGI setting. This will reload and recompile the code with every request. This can be slow for production settings but plenty fast for development. The TiddlyWeb source package includes an index.cgi script for this. Ben Gillies has written some docs on how to use it.

Of course to use the CGI option you have to have some kind of web server around. Many systems come with Apache already on them or easy to install. If you prefer not to run Apache (like me) another option is Spawning a fast multi-process, multi-threaded, Python web server. There’s a TiddlyWeb “factory” for Spawning called spawner.

Paul wisely pointed out that much of this reload fiddling about can be avoided by writing and testing Python modules that perform most of the activity for you application, minus the web handling, and then using these modules in small layers of Python code that integrate with TiddlyWeb.

There was a fair amount of resistance in the group to formal and rigorous testing, which I think is extremely unfortunate, but Simon rightly points out that lots of people are like that, and TiddlyWeb is going to limit its audience a great deal if it requires developers to be totally test driven.

Another point of difficulty and contention is lack of familiarity with Python, Unix and TCP fundamentals. I’m not sure how to respond to that sort of thing other than to say that if you want to develop networked applications in Python, on Unix, then it’s probably pretty good to be familiar with such stuff.

By the far the biggest issue with development is managing the inclusion of necessary stuff in the development environment. This includes auxiliary plugins that are required for the system, TiddlyWiki content and plugins, and static content that the web app needs. Mike’s blog post has most of the plans in this area. The gist is two pieces of development:

  • Improving the existing devtext store so it more effectively (and correctly) supports during development manipulation of the store and inclusion of TiddlyWiki content. We’ll be talking about the details of this in IRC this afternoon.
  • Creating a suite of shell-based tools that provide Make like functionality for establishing and maintaining a development environment and easing deployment. The long term plan is to migrate these tools to Python as time and knowledge allows.

For myself the things that have become important are:

  • Making existing plugins more visible to developers so they know what tools are available.
  • Packaging so-called mature plugins as Python packages that are installable via easy_install or pip.
  • Enhancing documentation of configuration options so they are more visible.
  • Publishing the quick hacks and tricks I do to make my own development life a bit more zipless.
Comments (View)
Aug
6th
Thu
permalink

Recent TiddlyWeb Plugins of Note

Though the historical roots of TiddlyWeb are as a store for quine-like systems (I should write more about this), because it has a very flexible plugin system it also manages to be something of an unintentional web-app framework. Many plugins have been developed over the past several months. Many are experiments to demonstrate possibilities. Others are practical tools. Here’s a bit of info on those I’ve recently put together. Ben Gillies, Jon Robson, Mike Mahemoff, and FND have all been doing interesting work as well.

Just this morning I made markdown.py which is in the class of plugins known as wikitext renderers. These provide a method for transforming the text of tiddler stored in TiddlyWeb into some other textual form, usually HTML. The default renderer is called ‘raw’ and returns its input. TiddlyWebWiki uses the wikklytextrender plugin as its default renderer. It takes TiddlyWiki formatted text and returns HTML. The markdown renderer renders Markdown syntax to HTML. It’s possible to run multiple wikitext renderers in the same TiddlyWeb instance, which is used is determined by the value of tiddler.type. Here’s the entire content of the markdown plugin:

import markdown2

def render(tiddler, environ):
    """
    Render text in the provided tiddler to HTML.
    """
    return markdown2.markdown(tiddler.text)

ramstore.py is a StorageInterface implementation for TiddlyWeb that persists the main entities of TiddlyWeb (Tiddlers, Bags, Recipes, Users) into RAM in the same process. Persist is not really the right term because as soon as the current process goes away, so do the tiddlers. This plugin mostly exists for testing (it allows you to effectively mock a store without actually being a mock) and to demonstrate the bare minimum of what a store needs to be able to do to fully support the StorageInterface. Longer term, however, I can see it being useful as part of a layered caching solution (one thread reads or writes RAM and returns control to the web request handling layer, another thread wakes up when RAM is written and takes what is written and sends it downstream to more persistent layers).

The default distribution of TiddlyWeb provides no exposure of User entities over HTTP. userbag.py and users demonstrate two (incomplete) ways of exposing them. The former uses a combination of diststore and a simple StorageInterface implementation which presents the users on a system as tiddlers in a bag called ‘users’. The latter add /users and /users/{usersign} routes with handlers that query the store. At the moment both are read only.

Maybe I’ll write another one of these.

Comments (View)