Entries in "Pagoda"
September Projects
I'm in the situation Ian Bicking was in not long ago—I'm really tired of this blog design and software and it's making me not want to post any of the entries I have pending. This blog will soon redirect to something better.
Pagoda should have a Developer Preview in October. Check out my presentation from the September meeting of the Cleveland Python interest group.
Remember how Ian and I spent months thinking up hundreds of names for our company? We are now incorporated as Unstoppable Rocket—one of the first names that was suggested.
If you've been following the geopy list, you've heard about the new release coming out. It should make things much more flexible and extendable, and fix all the issues from the past year or so. geopy 0.99 will be out this week.
The geopy update is also getting me back into the campus crime map and my Case geocoder service, which is going to be really smart. Updates there soon.
I started a new project called Revisionist, which is like Pagoda's revision model except generalized and using SQLAlchemy 0.4. I'm hoping other people will be interested in using and improving such a project. With the right helpers it should make revisioning complex models really easy.
If anyone has any neat suggestions for what Gary or I should talk about at the October Clepy meeting, let me know.
Content types in Pagoda, Part 1: The Model
Any content management system will inevitably have to think about having different content types. Common content types include pages, attachments, calendars, events, and blog articles. Why make the distinction between different things that all appear as "pages" to the user? Because, of course, different content types must support different features and respond to different actions. For example, an event content type must have a date in order to show up on a calendar, and a calendar content type might support an iCalendar feed of its events.
Likewise, content types will all share similar features and actions. They all have a URL and a title. And if we take the common CMS approach of making a site as a hierarchy of objects, they all have a parent object and child objects. While sites might not be inherently hierarchical (URLs are just identifiers!), it's quite natural to create them this way—for example, if we move a page, we'd expect its entire tree of child pages to move with it.
One of the first things a web developer does when starting a project is to model its content types. Read any MVC (or, ahem, MTV) web framework tutorial and there will be a Wiki model, or a Blog model, or a TodoList model—all content types. In this article I'll be talking about what it currently looks like to model a content type in Pagoda. Since Pagoda is based on TurboGears, our goal is to make building your app alongside Pagoda no different than building your app with TurboGears, and so far I think we've done a pretty good job. (And by the way, since we're using SQLAlchemy, this part of Pagoda is TG2 future-proof.)
So, if it's supposed to be the same as just using TurboGears, why do I have to show you anything? The answer is that while you don't have to design your model with Pagoda in mind (existing apps will coexist just fine), doing so will make your model easily localizable and revisionable! That's a pretty big benefit in the world of content management. We'll be able to restore old content records from any point in their history, and make changes to locale-independent fields for all translations at once.
So, on to the code. I'll be modeling a simple Event content type. First, here's how you might do it with some plain old TurboGears and SQLAlchemy.
from sqlalchemy import *
from sqlalchemy.ext.assignmapper import assign_mapper
from turbogears.database import metadata, session
from datetime import datetime
event_table = Table('event', metadata,
Column('event_id', Integer, primary_key=True),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('title', Unicode(200), nullable=False),
Column('url_slug', String(75), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
class Event(object):
def move_to_date(self, new_date):
self.start_date = new_date
if self.end_date:
time_delta = new_date - self.start_date
self.end_date += time_delta
assign_mapper(session.context, Event, event_table)
So, a pretty standard model with minimal event features. (One thing might not be obvious—the url_slug field is the short Latin-1 name of the event we'll show in the URL). Using the mapped Event class to use the model looks like this...
# Make an event.
bday_party = Event(start_date=datetime(2007, 10, 30, 19, 30),
title="Brian turns twenty-three!", url_slug="brian_turns_23",
description="The party will take place in my underwater hideout.",
show_in_calendar=True)
# End at midnight!
bday_party.end_date = datetime(2007, 10, 31)
# Write it!
session.flush()
Now, how would it change with support for translations, revisions, and having parent and child objects? The first step is to split the table up into locale-dependent and locale-independent tables. If you read about our localizable revision model you'll see that this is how we support independently revisioned translations while avoiding data duplication. Here's what the two tables look like that will replace event_table:
event_generic_table = Table('event_generic', metadata,
Column('event_id', Integer, primary_key=True),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('url_slug', String(75), nullable=False),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
event_localized_table = Table('event_localized', metadata,
Column('event_id', Integer, primary_key=True),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default="")
)
Since we want to support translations for the fields in event_localized_table, let's also add a locale field in order to tell the translations apart. locale will be a short identifier like "en-US", "fr", or "jp".
event_localized_table = Table('event_localized', metadata,
Column('event_id', Integer, primary_key=True),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('locale', String(25), nullable=False)
)
The next step is to point both tables at Pagoda's revision table in order to support revisions. Since each event record points to a unique revision record, our primary key is now redundant, and can be changed to the revision's ID:
from pagoda.models import Revision
event_generic_table = Table('event_generic', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('url_slug', String(75), nullable=False),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
event_localized_table = Table('event_localized', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('locale', String(25), nullable=False)
)
A column type of None here will cause SQLAlchemy to use the column type of the foreign key—almost always what you want. There's one more change to make. Since Pagoda helps manage your site's content hierarchy, it already has a table to hold the URL of every object on the site. So we can get rid of the url_slug field—Pagoda will include its own when we tell it about this content type. Our final tables:
from pagoda.models import Revision
event_generic_table = Table('event_generic', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
event_localized_table = Table('event_localized', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('locale', String(25), nullable=False)
)
Just a few more small changes! Since we have two different tables, and are now adding some more tables (like Revision) into the mix, we need to join them somehow for SQLAlchemy to map against the resulting join. Pagoda has a function called revisioned_table that will perform the necessary joins. Just tell it about your two tables and give it an alias:
from pagoda.models import Revision, revisioned_table
...
event_table = revisioned_table('event', event_generic_table, event_localized_table)
event_table is now a Selectable according to SQLAlchemy. Let's map against it! Pagoda uses a mapper extension to help with querying and modifying revisioned records. You can add pagoda.models.RevisionableMapperExtension to the mapper yourself, or you can use our helper called revision_mapper to do it. revision_mapper is a small wrapper around assign_mapper that makes sure the mapper extension is there, and gives the mapped class methods some more helpful docstrings.
from pagoda.models import Revision, revisioned_table, revision_mapper
...
revision_mapper(session.context, Event, event_table)
Last change! Since Event is now revisioned, it would be nice to have some helpful methods for dealing with revisions, like querying for the latest published revision or creating a new revision based on a previous revision. Pagoda has a base class for your mapped class that will give it a few such methods. Just subclass your mapped class from Revision:
class Event(Revision):
def move_to_date(self, new_date):
self.start_date = new_date
if self.end_date:
time_delta = new_date - self.start_date
self.end_date += time_delta
And that's all it takes to support revisions. Event works just like before, except it now has some more methods and fields. A url column came from Pagoda's Node table, content_id and content_type came from Pagoda's Content table, and Revision's columns came along too. Note that no columns were added to either Event table—these additional fields came from joins. Using it looks much the same as before:
bday_party = Event(start_date=datetime(2007, 10, 30, 19, 30),
title="Brian turns twenty-three!", url="brian_turns_23",
description="The party will take place in my underwater hideout.",
show_in_calendar=True, locale='en', content_type='event',
revision_author="brian")
revised_bday_party = bday_party.new_revision(title="Brian gets older")
revised_bday_party.publish()
session.flush()
# revised_bday_party is now "active" - the latest published revision
calendar_events = Event.select_active_by(show_in_calendar=True)
from datetime import datetime, timedelta
yesterday = datetime.today() - timedelta(days=1)
events_as_they_were_yesterday = Event.filter_snapshot(
yesterday
).select_by(show_in_calendar=True)
Here's the final code. It's just a couple more lines than the original model at the beginning of this article:
from sqlalchemy import *
from sqlalchemy.ext.assignmapper import assign_mapper
from turbogears.database import metadata, session
from datetime import datetime
from pagoda.models import Revision, revisioned_table, revision_mapper
event_generic_table = Table('event_generic', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
event_localized_table = Table('event_localized', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('locale', String(25), nullable=False)
)
event_table = revisioned_table('event', event_generic_table, event_localized_table)
class Event(Revision):
def move_to_date(self, new_date):
self.start_date = new_date
if self.end_date:
time_delta = new_date - self.start_date
self.end_date += time_delta
revision_mapper(session.context, Event, event_table)
So, hopefully those changes to the original Event weren't too jarring. Sure we could make many of those changes automatically, but we're trying to avoid magic in favor of small helpers, each extending the model The SQLAlchemy Way. If you think all this is too much work, let us know! We want this to be fun to hack on for everyone, not just us.
Next time I'll talk about content type controllers.
Polymorphic, multilingual, revisioned content!
Let's talk about modeling revisioned, localized content in SQL! We're talking database engine agnostic here, people. This should work in SQLite, MySQL, PostgreSQL, and Bob's ValueSQL.
But first, if you're easily bored like me, here's a pretty visual aid I made for you describing the model we came up with. It's supposed to visualize how independently revisioned parts of an object can be combined into a single revision history. The rest of the post talks about how we ended up with this model. Click on it!
Content can mean any set of fields that applies to one type of object. For example, a page might have a title and text and some flags like whether or not to show up in the site's menu and search results.
Supporting revisions is useful for a couple of reasons. One is that it allows users to undo their actions in case something goes horribly wrong. Secondly, you might want a workflow with some approval process such that new changes can be in a pending state while older versions are still active.
A simple way to model this might look like the following. You can easily imagine what the SQL looks like.
- revision
-
- revision_id
- A surrogate primary key. In an alternate schema it could be used as a composite key with
content_id, in which case it could be interpreted as a revision number. - content_id
- Some number identifying this content that won't change across revisions.
- author
- The user who made this change.
- timestamp
- The time of the change.
- comment
- The author's description of the change.
- status
- The workflow status (could be anything, depending on your workflow needs).
- page
-
- revision_id
- A surrogate primary key that points to the
revisiontable. - content_id
- A foreign key to the
revisiontable. - title
- The title text.
- text
- The main page content.
- show_in_menu
- A flag indicating whether or not to show up in the site's menu.
- show_in_search
- A flag indicating whether or not to show up in the site's search results.
Pretty easy, right? revision_id and content_id are foreign keys to the revision table, which will store information like the author, timestamp, comment, and workflow status for revisions of any type. content_id could also only exist in the revision table and be retrieved with a join, but I'm including it in the page table to make the examples easier.
This is simple enough if you don't need to support content translations. If you want a software solution that people will take seriously, however, localization shouldn't just be possible—it should be easy. And sadly, it's not the easiest thing to model correctly. A Drupal developer has a good summary of why this is so hard. We need some way to have, for each content_id, multiple translations of the localizable fields—in this case, title and text.
A bad solution would be to add a column for every language-field pair:
- page
-
- ...
- en_title
- en_text
- jp_title
- jp_text
- es_title
- es_text
Like I said, that's obviously no good. Not only do we not want to add columns for every new translation, but we also don't want a new record with all of these columns when only one of the translations is updated. In other words, we want the translations to be independently revisioned, which better reflects how sites are normally translated. Another solution might be to have locale-independent title and text fields whose values are identifiers for strings in a translation table shared among all content types. Unfortunately this makes it harder to create and update page records, and also forces us to give up on any fields that need a unique schema, such as a maximum content length.
A better solution would be to just add a locale field:
- page
-
- ...
- locale
- The locale of the translated fields in this row. For example, "en_US".
So we might have some entries that look like this:
| revision_id | content_id | locale | title | text | show_in_menu | show_in_search |
|---|---|---|---|---|---|---|
| 1 | 1 | en_US | Warning! | The kitchen is on fire! | true | true |
| 2 | 1 | es | ¡Aviso! | ¡La cocina se arde! | true | true |
| 3 | 2 | en | Bathroom status... | Safe! | false | false |
There are two pages here. One has two translations in U.S. English and Spanish. The other has just an English translation. They all have a different revision_id—remember, this is just a link to another table that stores information about the change.
You might be able to see why this isn't ideal. What happens when we want a new revision that changes show_in_menu or show_in_search, two non-locale-specific fields? We'd have to insert a new record with the updates for every translation. That's a lot of work, and could be a lot of duplication!
To minimize duplication of data, we can split up the locale-dependent and locale-independent fields into two tables for each content type. Our tables would look like this:
- page_generic
-
- revision_id
- content_id
- show_in_menu
- show_in_search
- page_localized
-
- revision_id
- content_id
- locale
- title
- text
So, to get whole page revisions now we can just line up the revision_id from each table, and if an equivalent revision_id doesn't exist, just use the previous one that does exist from the table where it's missing. For instance, if the revision history looks like this:
| page_generic | 1 | 2 | 5 | 7 | |||
|---|---|---|---|---|---|---|---|
| page_localized | 1 | 3 | 4 | 5 | 6 |
The revisions would be the pairs (1, 1), (2, 1), (2, 3), (2, 4), (5, 5), (5, 6), (7, 6). But we need to take locale into account...
| page_generic | 1 | 2 | 5 | 7 | |||
|---|---|---|---|---|---|---|---|
| page_localized | 1 | 3 | 4 | 5 | 6 | ||
| en | en | jp | jp | en |
There are more pairs now, since each page_generic record needs to be paired with not just the previous page_localized record, but the previous page_localized record in each locale. The pairs are (1, 1), (2, 1), (2, 3), (2, 4), (5, 3), (5, 5), (5, 6), (7, 5), (7, 6).
There's one more aspect to cover in this model. What about the fields that all content types will need, and what if they, too, should be independently revisioned? For example, every content type might have a url field and it would be nice if we could undo that, too. But moving a page isn't really changing the page, it's more like changing the site, and besides, it would be nice to have the url for every object, regardless of content type, in one place. Luckily this looks very similar to what we have now, only this time the table is shared among content types. If we call this shared table node, the revision history might look like this:
| node | 1 | 3 | 7 | 8 | ||||
|---|---|---|---|---|---|---|---|---|
| page_generic | 1 | 2 | 5 | 7 | ||||
| page_localized | 1 | 3 | 4 | 5 | 6 | |||
| en | en | jp | jp | en |
The revision triplets would then be (1, 1, 1), (1, 2, 1), (3, 2, 3), (3, 2, 4), (3, 5, 3), (3, 5, 5), (3, 5, 6), (7, 7, 5), (7, 7, 6), (8, 7, 5), (8, 7, 6).
For the curious, the SQL to select all these triplets is complex but certainly not complicated:
SELECT revision.revision_id,
revision.content_id,
revision.author,
revision.timestamp,
revision.comment,
node.revision_id as node_revision,
node.url,
page_generic.revision_id as generic_revision,
page_generic.show_in_menu,
page_generic.show_in_search,
page_localized.revision_id as localized_revision,
page_localized.title,
page_localized.text
FROM revision, node, page_generic, page_localized
WHERE node.revision_id = (
SELECT max(node.revision_id)
FROM node
WHERE node.revision_id <= revision.revision_id
AND node.content_id = revision.content_id
) AND page_generic.revision_id = (
SELECT max(page_generic.revision_id)
FROM page_generic
WHERE page_generic.revision_id <= revision.revision_id
AND page_generic.content_id = revision.content_id
) AND page_localized.revision_id IN (
SELECT max(page_localized.revision_id)
FROM page_localized
WHERE page_localized.revision_id <= revision.revision_id
AND page_localized.content_id = revision.content_id
GROUP BY page_localized.locale
) AND (
revision.revision_id = node.revision_id OR
revision.revision_id = page_generic.revision_id OR
revision.revision_id = page_localized.revision_id
)
GROUP BY node.revision_id,
page_generic.revision_id,
page_localized.revision_id
Notice that the revision table is correlated in those subqueries - that is, it refers to the revision table in the outer select. Very important!
We went through many designs of our revision model before ending up with this one, and so far it seems to alleviate all the problems our previous designs faced. I hope I've shown you that there are a lot of things to consider when using multilingual revisioned records, and I hope you believe me when I say that this solution is serving us pretty well. SQLAlchemy handles it beautifully, although we've run across many chances for SQLAlchemy to improve along the way, and even submitted a patch.
Since one of our goals in Pagoda is to allow people to easily create content types using this model, we have a few simple helpers that will hide many of the details. Next post: eat the sandwich!
Project and apartment updates
Some interesting bits from the past few weeks...
Next Tuesday I'll be having lunch with Mike Cermak, webmaster for the Greater Cleveland Regional Transit Authority. In my previous entry I mentioned my RTA Schedule project which has been gaining popularity. There were only a few routes listed on there when I posted it, and the list has been growing as people have been using the route adder. Mike wants to work together to come up with ideas and improvements that will encourage projects like mine—a very cool response, and beneficial to RTA users as well. I'm looking forward to it!
Remember those wacky import tricks I posted about to get multiple database engines working nicely in Pagoda? After coming up with that, Ian dug around to figure out what changes would be necessary to not have to do that. He narrowed it down to one single line of code in TurboGears! In turbogears.database:
def create_session():
"Creates a session with the appropriate engine"
return sqlalchemy.create_session(bind_to=get_engine())
That bind_to argument is totally unnecessary when using DynamicMetaData! Changing that to just use SQLAlchemy's create_session without arguments makes multiple database engines possible without any black magic. Unfortunately, we didn't notice TurboGears 1.0.2 about to be released and didn't start any discussion about changing this in time. For now we use this little monkeypatch:
session_context = turbogears.database.session.context
session_context.registry.createfunc = sqlalchemy.create_session
So I think it works more like Alchemyware now, except we don't have to write models any differently and the engines are cached. The metadata is simply pointed to the appropriate engine in each thread.
Speaking of Pagoda, we're still at least a couple weeks away from a beta release. We're currently writing glue for all the little bits and pieces we've created over the past couple months. We've satisfied many of our original goals and learned more about (and sometimes changed) others. I'll share more about these satisfied and modified goals later.
Pagoda's third contributor, Chris, moved back home to start hunting for jobs in the California area. Good luck, Chris! Chris is a fine electrical engineer and programmer and you should hire him. This was his plan since starting to help with Pagoda, so it doesn't really affect our development schedule.
After receiving practically no feedback from the release of dmath, there has been a small surge of interest recently, with a couple contributions, so there will likely be a new release. I put up a new egg of the old version on the Cheese Shop after learning that the Python 2.5 version was busted.
geopy continues to receive patches; recently the most-requested improvement was contributed by Amos Latteier and that is the removal of print chatter in favor of logging. I'll get 0.94 out this weekend with that and other improvements.
Since Chris moved out, our friend Greg moved in with me and Sara. Greg went to school for art and likes to paint and draw, and might even prove his cooking talents at culinary school next semester. I'll be helping him make a website for his comics, which are very funny, but I can't decide if it's because I know Greg and imagine him coming up with them, which itself makes me laugh. You'll be the judge soon enough...
There are two more new, smaller residents of our apartment as well... one's a 14-inch Oscar cichlid and the other's a 15-inch Plecostomus. They're friendly and big! Now I have fantasies about getting them a bigger aquarium with all manner of luxuries. I picked them up from someone who's graduating and they came with their 45-gallon home and necessities for free! I'll post some pictures of these guys soon.
Multiple sites, one Python: Pagoda import tricks
One of our early goals when designing Pagoda was to allow a single Pagoda instance to support multiple sites. This was due to the way memory works for web servers running on Python and TurboGears. How exactly this adds up depends on your threading and web server configuration (mod_python), but traditionally hosting multiple sites means running at least one Python instance per site, each costing 10-20 MB. The more modules each instance loads, the higher the memory usage, and since Pagoda sites will likely use a bunch of modules, that adds up. The most limiting factor in many hosting services is the amount of memory your account is allowed to consume.
Obviously if each Pagoda site is large and running custom code, it might be a good idea to run each in its own Python instance, so one site can't bring down all the others. But the common case, we think, is a bunch of moderately sized sites using just the built-in page management tools. So we devised some ways to allow multiple sites to run from one TurboGears project...
The first and simplest plan involved a database model, where pages and other table rows point to whichever site they belong to. You probably already know why this is a bad idea. First of all, every single table in the database needed to have a site_id column, since nothing would be shared between sites. Unique things like usernames would need their constraints modified to only be unique per-site. That got old pretty fast. Secondly was security. How could we ensure that every piece of code touching the database, even the eventual third-party plugins, would use the correct site in their queries so as not to mess with the others? And finally, having each site's contents in one massive database would not be very convenient if the site owners wanted backups of their portion of the database.
So we started looking at multi-database solutions, and quickly realized we were pretty much on our own for what we wanted to do. We don't just want some models in one database, and other models in a difference database; we want the same models in every database. Every site needs a pages table, for example. Since we're mapping tables with SQLAlchemy, and each mapper is bound to metadata, an engine, and a session, it seems that we'd need to run the table and mapper definitions once per site; each time, the engine would point to the appropriate site's. And now the big trick: how do we do this without modifying any model code, so that plugin writers don't have to learn any silly new details, and without doing a bunch of extra work every time a controller needs to use a model? If our controllers import pagoda.models.pages, how will it know to get the Page class bound to the current site's engine, and not another site's?
We looked to CherryPy for inspiration. In a TurboGears controller, importing cherrypy.request and cherrypy.response will make the current thread's request and response objects available. How do these objects magically belong to the appropriate thread? They simply use a class called ThreadLocalProxy. As the name suggests, cherrypy.request and cherrypy.response are proxy objects that determine the current thread and point object access to the correct request and response instances. Similarly, we want something like SiteLocalProxy, which will make model classes available that are magically bound to the correct site's engine.
Using ThreadLocalProxy as inspiration, we made a clever little object called site. When anything is imported from pagoda.site, it will rebind turbogears.database.metadata and turbogears.database.session after updating sqlalchemy.dburi in the config to point to the current site's. Then the requested module is imported and cached for next time (so the models aren't reinitialized every time). No model code was changed at all! The only necessary modification was importing from pagoda.site.models instead of pagoda.models in our controllers.
Our first implementation looked very much like ThreadLocalProxy, but it made our import statements look funny since site wasn't a real module. So we started investigating the imp, ihooks, and imputils modules, eventually leading us to PEP 302. With help from Importing (to reduce the amount of code necessary), we now have a special pseudo-module called site, and Pagoda modules imported from that will take the current request's site into account instead of just being imported once for the entire process.
Before writing up this entry, I came across Alchemyware. At first it looked promising for what we want to do, but as far as I can tell it requires modifying the way you write models and reinstantiating them on every request. Also, I don't understand how the mapped class can be "shared by everyone" if it's being mapped to multiple databases.
Anyway, after cleaning up our proof-of-concept I'll share the code behind our import trickery in case anyone is trying to do something similar, but mostly just because such tricks are interesting.
In case you forgot, we missed the end-of-March deadline we set for our demo, due in part to being burned out after PyCon. We're shooting for the end of April now.
Pagoda CMS Notes

If you were at PyCon 2007 or read Gary's blog or read the TurboGears mailing list, you may have seen Pagoda CMS mentioned. Pagoda is an open source content management system I've been working on with Chris and Ian. It's built on TurboGears and is focused on being simple yet extensible. We put up an introductory screencast at pagodacms.org that we hurriedly made the night before PyCon.
We've tried a bunch of content management systems, both open source and commercial, and developed for small shops, big corporations, government organizations, and of course Case itself. There are features that are consistently implemented poorly, hard to understand, or simply missing. Pagoda is a result of the observations we've made of how content management systems are really used in a production setting.
These are just a few of the notes and design goals we've been using along the way.
Don't overengineer it
Somewhere along the line someone decided that if you're going to make a content management system, you have to build everything on top of a dozen layers of abstraction. Some pretend that there's no difference between static page content (like a blog entry) and dynamically generated content (like a news feed). Some pretend that building complex workflows that are exactly suited to the way your organization is structured is a common need (we've found that people already have real-life workflows and rarely do they want this duplicated in a CMS).
Experience has shown us that such complexity is rarely needed. We don't try to fit every feature into a "plugin" structure or an "actions" framework. We've streamlined the features based on our experience, and hopefully kept it fun to hack on (when you do need something extra) by avoiding meaningless abstractions.
Do one thing really, really well
A lot of content management systems try to do everything involved in running a web site. Database management, email management, form design, you name it. We don't want a content management system that takes over every aspect of making a web site. We've made conscious decisions to leave a lot of features out. In addition to the above (which can all be found in Zope + Plone, for example), we've spent a lot of time deciding how far certain features should reach and what should be left up to the webmaster.
One example is theme switching. When you're first building a web site, being able to download prepackaged themes might be nice. But for production sites, this simply does not happen. Imagine the Cleveland Museum of Natural History or a university department downloading new themes and swapping them out. Not gonna happen. Instead this is limiting, because prepackaged themes require predetermined markup. As a result, most Plone sites look the same and are structured the same way. They have the little tree on the left and those tiny tabs and a logo above that. And then you're scared to modify too much CSS because there's a bunch already dedicated to making those tabs pixel-perfect. We don't have a default theme or even default markup. Markup and design are meant for programmers and web designers, let's not pretend otherwise.
Use simple terminology
As Jeffrey Veen mentioned in Making A Better CMS, stop it with the jargon already! "Mambots", "archetypes", "portlets", and I'll admit it, I'm not even a fan of the term "widgets". We've tried to use understandable terminology throughout Pagoda and not extend failed analogies.
One example where we created a feature and spent some effort on choosing a name is Placeholders. This is a feature that we've actually needed on production sites but haven't found in other content management systems. The idea is that there is text that appears on multiple pages within the content, and it would be nice to only have to change in one place so we don't have to hunt down every page in the future. Phone numbers, store hours, admission prices, and press contact information are some examples. These aren't template variables because they have nothing to do with templates (to the user) and aren't arbitrary Python objects, and they're not code snippets because they have nothing to do with code. They're simply content placeholders. Here's the mockup we used while implementing this feature:

Borrow features that work
We've had a lot of inspiration along the way and used it to solve real problems. For example, if you need to have a downloadable file on your web site, a lot of content management systems will force you to ask "where do I put this?" and once you've decided on a place, require you to find your way there in the filesystem. We decided on pages having Attachments. Most downloads are associated with a particular page, so just upload them to that page and that will determine their location. We used 37signal's Campfire for inspiration, where people can upload files to the room they're in and they appear as attachments.
Reduce the number of clicks
We're lucky enough to have started developing after AJAX became popular. The "Web 2.0" buzzword might be annoying, but this is really something we can use to make content management quicker and easier. Navigation and messing around with page options won't require dozens of clicks and page reloads anymore. Instead of having to retrofit our software to take advantage of AJAX, we can design with it in mind.
Built a content management system, not a new framework
Similar to doing one thing really well, we're not building a web framework. That's what TurboGears is for. People can still use their existing TurboGears controllers, models, and templates. We're using SQLAlchemy for Pagoda's models and Genshi for the templates. To install Pagoda for your existing TurboGears project, you'll just have to subclass from PagodaController instead of the default RootController, so Pagoda can dispatch requests to the appropriate page.
So hopefully it sounds like an interesting project. We're still hacking on the core and hope to release a demo before the end of March, when we'll also invite people to help out and find weak spots. We have some mailing lists on Google Groups for discussion: pagoda-talk (general discussion), pagoda-coders (core development), and pagoda-announcements (for releases and other notices). For the first few releases we'll also make announcements on the TurboGears list.
