Multiple sites, one Python: Pagoda import tricks
posted by brian at 06:57 PM
One of our early goals when designing Pagoda was to allow a single Pagoda instance to support multiple sites. This was due to the way memory works for web servers running on Python and TurboGears. How exactly this adds up depends on your threading and web server configuration (mod_python), but traditionally hosting multiple sites means running at least one Python instance per site, each costing 10-20 MB. The more modules each instance loads, the higher the memory usage, and since Pagoda sites will likely use a bunch of modules, that adds up. The most limiting factor in many hosting services is the amount of memory your account is allowed to consume.
Obviously if each Pagoda site is large and running custom code, it might be a good idea to run each in its own Python instance, so one site can't bring down all the others. But the common case, we think, is a bunch of moderately sized sites using just the built-in page management tools. So we devised some ways to allow multiple sites to run from one TurboGears project...
The first and simplest plan involved a database model, where pages and other table rows point to whichever site they belong to. You probably already know why this is a bad idea. First of all, every single table in the database needed to have a site_id column, since nothing would be shared between sites. Unique things like usernames would need their constraints modified to only be unique per-site. That got old pretty fast. Secondly was security. How could we ensure that every piece of code touching the database, even the eventual third-party plugins, would use the correct site in their queries so as not to mess with the others? And finally, having each site's contents in one massive database would not be very convenient if the site owners wanted backups of their portion of the database.
So we started looking at multi-database solutions, and quickly realized we were pretty much on our own for what we wanted to do. We don't just want some models in one database, and other models in a difference database; we want the same models in every database. Every site needs a pages table, for example. Since we're mapping tables with SQLAlchemy, and each mapper is bound to metadata, an engine, and a session, it seems that we'd need to run the table and mapper definitions once per site; each time, the engine would point to the appropriate site's. And now the big trick: how do we do this without modifying any model code, so that plugin writers don't have to learn any silly new details, and without doing a bunch of extra work every time a controller needs to use a model? If our controllers import pagoda.models.pages, how will it know to get the Page class bound to the current site's engine, and not another site's?
We looked to CherryPy for inspiration. In a TurboGears controller, importing cherrypy.request and cherrypy.response will make the current thread's request and response objects available. How do these objects magically belong to the appropriate thread? They simply use a class called ThreadLocalProxy. As the name suggests, cherrypy.request and cherrypy.response are proxy objects that determine the current thread and point object access to the correct request and response instances. Similarly, we want something like SiteLocalProxy, which will make model classes available that are magically bound to the correct site's engine.
Using ThreadLocalProxy as inspiration, we made a clever little object called site. When anything is imported from pagoda.site, it will rebind turbogears.database.metadata and turbogears.database.session after updating sqlalchemy.dburi in the config to point to the current site's. Then the requested module is imported and cached for next time (so the models aren't reinitialized every time). No model code was changed at all! The only necessary modification was importing from pagoda.site.models instead of pagoda.models in our controllers.
Our first implementation looked very much like ThreadLocalProxy, but it made our import statements look funny since site wasn't a real module. So we started investigating the imp, ihooks, and imputils modules, eventually leading us to PEP 302. With help from Importing (to reduce the amount of code necessary), we now have a special pseudo-module called site, and Pagoda modules imported from that will take the current request's site into account instead of just being imported once for the entire process.
Before writing up this entry, I came across Alchemyware. At first it looked promising for what we want to do, but as far as I can tell it requires modifying the way you write models and reinstantiating them on every request. Also, I don't understand how the mapped class can be "shared by everyone" if it's being mapped to multiple databases.
Anyway, after cleaning up our proof-of-concept I'll share the code behind our import trickery in case anyone is trying to do something similar, but mostly just because such tricks are interesting.
In case you forgot, we missed the end-of-March deadline we set for our demo, due in part to being burned out after PyCon. We're shooting for the end of April now.