Entries in "Python"
September Projects
I'm in the situation Ian Bicking was in not long ago—I'm really tired of this blog design and software and it's making me not want to post any of the entries I have pending. This blog will soon redirect to something better.
Pagoda should have a Developer Preview in October. Check out my presentation from the September meeting of the Cleveland Python interest group.
Remember how Ian and I spent months thinking up hundreds of names for our company? We are now incorporated as Unstoppable Rocket—one of the first names that was suggested.
If you've been following the geopy list, you've heard about the new release coming out. It should make things much more flexible and extendable, and fix all the issues from the past year or so. geopy 0.99 will be out this week.
The geopy update is also getting me back into the campus crime map and my Case geocoder service, which is going to be really smart. Updates there soon.
I started a new project called Revisionist, which is like Pagoda's revision model except generalized and using SQLAlchemy 0.4. I'm hoping other people will be interested in using and improving such a project. With the right helpers it should make revisioning complex models really easy.
If anyone has any neat suggestions for what Gary or I should talk about at the October Clepy meeting, let me know.
Content types in Pagoda, Part 1: The Model
Any content management system will inevitably have to think about having different content types. Common content types include pages, attachments, calendars, events, and blog articles. Why make the distinction between different things that all appear as "pages" to the user? Because, of course, different content types must support different features and respond to different actions. For example, an event content type must have a date in order to show up on a calendar, and a calendar content type might support an iCalendar feed of its events.
Likewise, content types will all share similar features and actions. They all have a URL and a title. And if we take the common CMS approach of making a site as a hierarchy of objects, they all have a parent object and child objects. While sites might not be inherently hierarchical (URLs are just identifiers!), it's quite natural to create them this way—for example, if we move a page, we'd expect its entire tree of child pages to move with it.
One of the first things a web developer does when starting a project is to model its content types. Read any MVC (or, ahem, MTV) web framework tutorial and there will be a Wiki model, or a Blog model, or a TodoList model—all content types. In this article I'll be talking about what it currently looks like to model a content type in Pagoda. Since Pagoda is based on TurboGears, our goal is to make building your app alongside Pagoda no different than building your app with TurboGears, and so far I think we've done a pretty good job. (And by the way, since we're using SQLAlchemy, this part of Pagoda is TG2 future-proof.)
So, if it's supposed to be the same as just using TurboGears, why do I have to show you anything? The answer is that while you don't have to design your model with Pagoda in mind (existing apps will coexist just fine), doing so will make your model easily localizable and revisionable! That's a pretty big benefit in the world of content management. We'll be able to restore old content records from any point in their history, and make changes to locale-independent fields for all translations at once.
So, on to the code. I'll be modeling a simple Event content type. First, here's how you might do it with some plain old TurboGears and SQLAlchemy.
from sqlalchemy import *
from sqlalchemy.ext.assignmapper import assign_mapper
from turbogears.database import metadata, session
from datetime import datetime
event_table = Table('event', metadata,
Column('event_id', Integer, primary_key=True),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('title', Unicode(200), nullable=False),
Column('url_slug', String(75), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
class Event(object):
def move_to_date(self, new_date):
self.start_date = new_date
if self.end_date:
time_delta = new_date - self.start_date
self.end_date += time_delta
assign_mapper(session.context, Event, event_table)
So, a pretty standard model with minimal event features. (One thing might not be obvious—the url_slug field is the short Latin-1 name of the event we'll show in the URL). Using the mapped Event class to use the model looks like this...
# Make an event.
bday_party = Event(start_date=datetime(2007, 10, 30, 19, 30),
title="Brian turns twenty-three!", url_slug="brian_turns_23",
description="The party will take place in my underwater hideout.",
show_in_calendar=True)
# End at midnight!
bday_party.end_date = datetime(2007, 10, 31)
# Write it!
session.flush()
Now, how would it change with support for translations, revisions, and having parent and child objects? The first step is to split the table up into locale-dependent and locale-independent tables. If you read about our localizable revision model you'll see that this is how we support independently revisioned translations while avoiding data duplication. Here's what the two tables look like that will replace event_table:
event_generic_table = Table('event_generic', metadata,
Column('event_id', Integer, primary_key=True),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('url_slug', String(75), nullable=False),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
event_localized_table = Table('event_localized', metadata,
Column('event_id', Integer, primary_key=True),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default="")
)
Since we want to support translations for the fields in event_localized_table, let's also add a locale field in order to tell the translations apart. locale will be a short identifier like "en-US", "fr", or "jp".
event_localized_table = Table('event_localized', metadata,
Column('event_id', Integer, primary_key=True),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('locale', String(25), nullable=False)
)
The next step is to point both tables at Pagoda's revision table in order to support revisions. Since each event record points to a unique revision record, our primary key is now redundant, and can be changed to the revision's ID:
from pagoda.models import Revision
event_generic_table = Table('event_generic', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('url_slug', String(75), nullable=False),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
event_localized_table = Table('event_localized', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('locale', String(25), nullable=False)
)
A column type of None here will cause SQLAlchemy to use the column type of the foreign key—almost always what you want. There's one more change to make. Since Pagoda helps manage your site's content hierarchy, it already has a table to hold the URL of every object on the site. So we can get rid of the url_slug field—Pagoda will include its own when we tell it about this content type. Our final tables:
from pagoda.models import Revision
event_generic_table = Table('event_generic', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
event_localized_table = Table('event_localized', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('locale', String(25), nullable=False)
)
Just a few more small changes! Since we have two different tables, and are now adding some more tables (like Revision) into the mix, we need to join them somehow for SQLAlchemy to map against the resulting join. Pagoda has a function called revisioned_table that will perform the necessary joins. Just tell it about your two tables and give it an alias:
from pagoda.models import Revision, revisioned_table
...
event_table = revisioned_table('event', event_generic_table, event_localized_table)
event_table is now a Selectable according to SQLAlchemy. Let's map against it! Pagoda uses a mapper extension to help with querying and modifying revisioned records. You can add pagoda.models.RevisionableMapperExtension to the mapper yourself, or you can use our helper called revision_mapper to do it. revision_mapper is a small wrapper around assign_mapper that makes sure the mapper extension is there, and gives the mapped class methods some more helpful docstrings.
from pagoda.models import Revision, revisioned_table, revision_mapper
...
revision_mapper(session.context, Event, event_table)
Last change! Since Event is now revisioned, it would be nice to have some helpful methods for dealing with revisions, like querying for the latest published revision or creating a new revision based on a previous revision. Pagoda has a base class for your mapped class that will give it a few such methods. Just subclass your mapped class from Revision:
class Event(Revision):
def move_to_date(self, new_date):
self.start_date = new_date
if self.end_date:
time_delta = new_date - self.start_date
self.end_date += time_delta
And that's all it takes to support revisions. Event works just like before, except it now has some more methods and fields. A url column came from Pagoda's Node table, content_id and content_type came from Pagoda's Content table, and Revision's columns came along too. Note that no columns were added to either Event table—these additional fields came from joins. Using it looks much the same as before:
bday_party = Event(start_date=datetime(2007, 10, 30, 19, 30),
title="Brian turns twenty-three!", url="brian_turns_23",
description="The party will take place in my underwater hideout.",
show_in_calendar=True, locale='en', content_type='event',
revision_author="brian")
revised_bday_party = bday_party.new_revision(title="Brian gets older")
revised_bday_party.publish()
session.flush()
# revised_bday_party is now "active" - the latest published revision
calendar_events = Event.select_active_by(show_in_calendar=True)
from datetime import datetime, timedelta
yesterday = datetime.today() - timedelta(days=1)
events_as_they_were_yesterday = Event.filter_snapshot(
yesterday
).select_by(show_in_calendar=True)
Here's the final code. It's just a couple more lines than the original model at the beginning of this article:
from sqlalchemy import *
from sqlalchemy.ext.assignmapper import assign_mapper
from turbogears.database import metadata, session
from datetime import datetime
from pagoda.models import Revision, revisioned_table, revision_mapper
event_generic_table = Table('event_generic', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('start_date', DateTime, nullable=False, default=datetime.now),
Column('end_date', DateTime, nullable=True),
Column('show_in_calendar', Boolean, nullable=False, default=True)
)
event_localized_table = Table('event_localized', metadata,
Column('revision_id', None, primary_key=True,
ForeignKey(Revision.c.revision_id)),
Column('title', Unicode(200), nullable=False),
Column('description', TEXT, nullable=False, default=""),
Column('locale', String(25), nullable=False)
)
event_table = revisioned_table('event', event_generic_table, event_localized_table)
class Event(Revision):
def move_to_date(self, new_date):
self.start_date = new_date
if self.end_date:
time_delta = new_date - self.start_date
self.end_date += time_delta
revision_mapper(session.context, Event, event_table)
So, hopefully those changes to the original Event weren't too jarring. Sure we could make many of those changes automatically, but we're trying to avoid magic in favor of small helpers, each extending the model The SQLAlchemy Way. If you think all this is too much work, let us know! We want this to be fun to hack on for everyone, not just us.
Next time I'll talk about content type controllers.
SQLAlchemy Bundle for TextMate
We've been using a lot of SQLAlchemy here in Pagoda-land. Not long after I started using TextMate, I started making all kinds of shortcuts for common SQLAlchemy constructs. Pretty soon models were flying out of our fingertips left and right.
Anyway, now you can download my SQLAlchemy TextMate bundle. Just extract that file and drag the resulting bundle onto TextMate to install it. There are currently 8 Snippets and 2 Templates, a few of which are demonstrated below.
Here's a quick little screencast where I make a few related tables using a Template and some Snippets. As you can see I've still got revisions on the brain. There's no talking, just some music. It's a minute and a half long. You have a minute, right?
Ideas for additions and improvements are always welcome.
Tux Droid Presentation
Tonight at Clepy I gave a presentation about Tux Droid. A few months ago I received an offer to test and keep a free Tux Droid if I promised to do cool stuff with it. A while after responding and talking about Case's sweet hacker club it arrived in the mail! The presentation isn't much without the demonstrations, but the slides are online at exogen.case.edu/tux. When I come up with some nice polished Tux programs you'll hear about them here. Tux is available in Project Club for anyone to play with.
Project and apartment updates
Some interesting bits from the past few weeks...
Next Tuesday I'll be having lunch with Mike Cermak, webmaster for the Greater Cleveland Regional Transit Authority. In my previous entry I mentioned my RTA Schedule project which has been gaining popularity. There were only a few routes listed on there when I posted it, and the list has been growing as people have been using the route adder. Mike wants to work together to come up with ideas and improvements that will encourage projects like mine—a very cool response, and beneficial to RTA users as well. I'm looking forward to it!
Remember those wacky import tricks I posted about to get multiple database engines working nicely in Pagoda? After coming up with that, Ian dug around to figure out what changes would be necessary to not have to do that. He narrowed it down to one single line of code in TurboGears! In turbogears.database:
def create_session():
"Creates a session with the appropriate engine"
return sqlalchemy.create_session(bind_to=get_engine())
That bind_to argument is totally unnecessary when using DynamicMetaData! Changing that to just use SQLAlchemy's create_session without arguments makes multiple database engines possible without any black magic. Unfortunately, we didn't notice TurboGears 1.0.2 about to be released and didn't start any discussion about changing this in time. For now we use this little monkeypatch:
session_context = turbogears.database.session.context
session_context.registry.createfunc = sqlalchemy.create_session
So I think it works more like Alchemyware now, except we don't have to write models any differently and the engines are cached. The metadata is simply pointed to the appropriate engine in each thread.
Speaking of Pagoda, we're still at least a couple weeks away from a beta release. We're currently writing glue for all the little bits and pieces we've created over the past couple months. We've satisfied many of our original goals and learned more about (and sometimes changed) others. I'll share more about these satisfied and modified goals later.
Pagoda's third contributor, Chris, moved back home to start hunting for jobs in the California area. Good luck, Chris! Chris is a fine electrical engineer and programmer and you should hire him. This was his plan since starting to help with Pagoda, so it doesn't really affect our development schedule.
After receiving practically no feedback from the release of dmath, there has been a small surge of interest recently, with a couple contributions, so there will likely be a new release. I put up a new egg of the old version on the Cheese Shop after learning that the Python 2.5 version was busted.
geopy continues to receive patches; recently the most-requested improvement was contributed by Amos Latteier and that is the removal of print chatter in favor of logging. I'll get 0.94 out this weekend with that and other improvements.
Since Chris moved out, our friend Greg moved in with me and Sara. Greg went to school for art and likes to paint and draw, and might even prove his cooking talents at culinary school next semester. I'll be helping him make a website for his comics, which are very funny, but I can't decide if it's because I know Greg and imagine him coming up with them, which itself makes me laugh. You'll be the judge soon enough...
There are two more new, smaller residents of our apartment as well... one's a 14-inch Oscar cichlid and the other's a 15-inch Plecostomus. They're friendly and big! Now I have fantasies about getting them a bigger aquarium with all manner of luxuries. I picked them up from someone who's graduating and they came with their 45-gallon home and necessities for free! I'll post some pictures of these guys soon.
Multiple sites, one Python: Pagoda import tricks
One of our early goals when designing Pagoda was to allow a single Pagoda instance to support multiple sites. This was due to the way memory works for web servers running on Python and TurboGears. How exactly this adds up depends on your threading and web server configuration (mod_python), but traditionally hosting multiple sites means running at least one Python instance per site, each costing 10-20 MB. The more modules each instance loads, the higher the memory usage, and since Pagoda sites will likely use a bunch of modules, that adds up. The most limiting factor in many hosting services is the amount of memory your account is allowed to consume.
Obviously if each Pagoda site is large and running custom code, it might be a good idea to run each in its own Python instance, so one site can't bring down all the others. But the common case, we think, is a bunch of moderately sized sites using just the built-in page management tools. So we devised some ways to allow multiple sites to run from one TurboGears project...
The first and simplest plan involved a database model, where pages and other table rows point to whichever site they belong to. You probably already know why this is a bad idea. First of all, every single table in the database needed to have a site_id column, since nothing would be shared between sites. Unique things like usernames would need their constraints modified to only be unique per-site. That got old pretty fast. Secondly was security. How could we ensure that every piece of code touching the database, even the eventual third-party plugins, would use the correct site in their queries so as not to mess with the others? And finally, having each site's contents in one massive database would not be very convenient if the site owners wanted backups of their portion of the database.
So we started looking at multi-database solutions, and quickly realized we were pretty much on our own for what we wanted to do. We don't just want some models in one database, and other models in a difference database; we want the same models in every database. Every site needs a pages table, for example. Since we're mapping tables with SQLAlchemy, and each mapper is bound to metadata, an engine, and a session, it seems that we'd need to run the table and mapper definitions once per site; each time, the engine would point to the appropriate site's. And now the big trick: how do we do this without modifying any model code, so that plugin writers don't have to learn any silly new details, and without doing a bunch of extra work every time a controller needs to use a model? If our controllers import pagoda.models.pages, how will it know to get the Page class bound to the current site's engine, and not another site's?
We looked to CherryPy for inspiration. In a TurboGears controller, importing cherrypy.request and cherrypy.response will make the current thread's request and response objects available. How do these objects magically belong to the appropriate thread? They simply use a class called ThreadLocalProxy. As the name suggests, cherrypy.request and cherrypy.response are proxy objects that determine the current thread and point object access to the correct request and response instances. Similarly, we want something like SiteLocalProxy, which will make model classes available that are magically bound to the correct site's engine.
Using ThreadLocalProxy as inspiration, we made a clever little object called site. When anything is imported from pagoda.site, it will rebind turbogears.database.metadata and turbogears.database.session after updating sqlalchemy.dburi in the config to point to the current site's. Then the requested module is imported and cached for next time (so the models aren't reinitialized every time). No model code was changed at all! The only necessary modification was importing from pagoda.site.models instead of pagoda.models in our controllers.
Our first implementation looked very much like ThreadLocalProxy, but it made our import statements look funny since site wasn't a real module. So we started investigating the imp, ihooks, and imputils modules, eventually leading us to PEP 302. With help from Importing (to reduce the amount of code necessary), we now have a special pseudo-module called site, and Pagoda modules imported from that will take the current request's site into account instead of just being imported once for the entire process.
Before writing up this entry, I came across Alchemyware. At first it looked promising for what we want to do, but as far as I can tell it requires modifying the way you write models and reinstantiating them on every request. Also, I don't understand how the mapped class can be "shared by everyone" if it's being mapped to multiple databases.
Anyway, after cleaning up our proof-of-concept I'll share the code behind our import trickery in case anyone is trying to do something similar, but mostly just because such tricks are interesting.
In case you forgot, we missed the end-of-March deadline we set for our demo, due in part to being burned out after PyCon. We're shooting for the end of April now.
Better Python Editing in Kate
Perhaps this will damage my hacker credibility, but I use KDE's Kate Editor as my development environment. For a while I used KDevelop, but then I realized the only features I was using were Kate's (note to KDevelop developers: having to decide on a location and filename immediately when I hit New File is extremely annoying. Give me an Untitled!)
The included python.xml (the file that tells Kate how to categorize tokens for highlighting) is a bit insufficient for Python programmers: no differentiation of class definitions, function definitions, or decorators. I've added these and some other minor improvements to my own python.xml and packaged it up with a custom color scheme in kate_colors.tar.gz. If you extract this to your home directory, it will replace a few Kate-specific files. Your configuration options will be kept, but I think any custom colors you've set will be replaced (Kate lacks a way to split these up to make sharing easy). If you select "exogen - Dark" as the default schema in Fonts & Colors, your HTML, CSS, JavaScript, and Python files will look pretty:
Of course, you don't have to use the included schema. The enhanced syntax highlighter is the biggest improvement, giving you several more contexts under Text Highlighting Styles.
I shared these changes with Adrian Holovaty at PyCon, who I noticed also uses Kate. I guess he couldn't get used to the high contrast color scheme, but here's hoping the new syntax file is serving him well. ;)
(I also recommend the Tab Bar Extension, the Word Completion Plugin, and learning to use those double arrows in the Filesystem Browser panel.)
Pagoda CMS Notes

If you were at PyCon 2007 or read Gary's blog or read the TurboGears mailing list, you may have seen Pagoda CMS mentioned. Pagoda is an open source content management system I've been working on with Chris and Ian. It's built on TurboGears and is focused on being simple yet extensible. We put up an introductory screencast at pagodacms.org that we hurriedly made the night before PyCon.
We've tried a bunch of content management systems, both open source and commercial, and developed for small shops, big corporations, government organizations, and of course Case itself. There are features that are consistently implemented poorly, hard to understand, or simply missing. Pagoda is a result of the observations we've made of how content management systems are really used in a production setting.
These are just a few of the notes and design goals we've been using along the way.
Don't overengineer it
Somewhere along the line someone decided that if you're going to make a content management system, you have to build everything on top of a dozen layers of abstraction. Some pretend that there's no difference between static page content (like a blog entry) and dynamically generated content (like a news feed). Some pretend that building complex workflows that are exactly suited to the way your organization is structured is a common need (we've found that people already have real-life workflows and rarely do they want this duplicated in a CMS).
Experience has shown us that such complexity is rarely needed. We don't try to fit every feature into a "plugin" structure or an "actions" framework. We've streamlined the features based on our experience, and hopefully kept it fun to hack on (when you do need something extra) by avoiding meaningless abstractions.
Do one thing really, really well
A lot of content management systems try to do everything involved in running a web site. Database management, email management, form design, you name it. We don't want a content management system that takes over every aspect of making a web site. We've made conscious decisions to leave a lot of features out. In addition to the above (which can all be found in Zope + Plone, for example), we've spent a lot of time deciding how far certain features should reach and what should be left up to the webmaster.
One example is theme switching. When you're first building a web site, being able to download prepackaged themes might be nice. But for production sites, this simply does not happen. Imagine the Cleveland Museum of Natural History or a university department downloading new themes and swapping them out. Not gonna happen. Instead this is limiting, because prepackaged themes require predetermined markup. As a result, most Plone sites look the same and are structured the same way. They have the little tree on the left and those tiny tabs and a logo above that. And then you're scared to modify too much CSS because there's a bunch already dedicated to making those tabs pixel-perfect. We don't have a default theme or even default markup. Markup and design are meant for programmers and web designers, let's not pretend otherwise.
Use simple terminology
As Jeffrey Veen mentioned in Making A Better CMS, stop it with the jargon already! "Mambots", "archetypes", "portlets", and I'll admit it, I'm not even a fan of the term "widgets". We've tried to use understandable terminology throughout Pagoda and not extend failed analogies.
One example where we created a feature and spent some effort on choosing a name is Placeholders. This is a feature that we've actually needed on production sites but haven't found in other content management systems. The idea is that there is text that appears on multiple pages within the content, and it would be nice to only have to change in one place so we don't have to hunt down every page in the future. Phone numbers, store hours, admission prices, and press contact information are some examples. These aren't template variables because they have nothing to do with templates (to the user) and aren't arbitrary Python objects, and they're not code snippets because they have nothing to do with code. They're simply content placeholders. Here's the mockup we used while implementing this feature:

Borrow features that work
We've had a lot of inspiration along the way and used it to solve real problems. For example, if you need to have a downloadable file on your web site, a lot of content management systems will force you to ask "where do I put this?" and once you've decided on a place, require you to find your way there in the filesystem. We decided on pages having Attachments. Most downloads are associated with a particular page, so just upload them to that page and that will determine their location. We used 37signal's Campfire for inspiration, where people can upload files to the room they're in and they appear as attachments.
Reduce the number of clicks
We're lucky enough to have started developing after AJAX became popular. The "Web 2.0" buzzword might be annoying, but this is really something we can use to make content management quicker and easier. Navigation and messing around with page options won't require dozens of clicks and page reloads anymore. Instead of having to retrofit our software to take advantage of AJAX, we can design with it in mind.
Built a content management system, not a new framework
Similar to doing one thing really well, we're not building a web framework. That's what TurboGears is for. People can still use their existing TurboGears controllers, models, and templates. We're using SQLAlchemy for Pagoda's models and Genshi for the templates. To install Pagoda for your existing TurboGears project, you'll just have to subclass from PagodaController instead of the default RootController, so Pagoda can dispatch requests to the appropriate page.
So hopefully it sounds like an interesting project. We're still hacking on the core and hope to release a demo before the end of March, when we'll also invite people to help out and find weak spots. We have some mailing lists on Google Groups for discussion: pagoda-talk (general discussion), pagoda-coders (core development), and pagoda-announcements (for releases and other notices). For the first few releases we'll also make announcements on the TurboGears list.
Pagoda CMS Notes

If you were at PyCon 2007 or read Gary's blog or read the TurboGears mailing list, you may have seen Pagoda CMS mentioned. Pagoda is an open source content management system I've been working on with Chris and Ian. It's built on TurboGears and is focused on being simple yet extensible. We put up an introductory screencast at pagodacms.org that we hurriedly made the night before PyCon.
We've tried a bunch of content management systems, both open source and commercial, and developed for small shops, big corporations, government organizations, and of course Case itself. There are features that are consistently implemented poorly, hard to understand, or simply missing. Pagoda is a result of the observations we've made of how content management systems are really used in a production setting.
These are just a few of the notes and design goals we've been using along the way.
Don't overengineer it
Somewhere along the line someone decided that if you're going to make a content management system, you have to build everything on top of a dozen layers of abstraction. Some pretend that there's no difference between static page content (like a blog entry) and dynamically generated content (like a news feed). Some pretend that building complex workflows that are exactly suited to the way your organization is structured is a common need (we've found that people already have real-life workflows and rarely do they want this duplicated in a CMS).
Experience has shown us that such complexity is rarely needed. We don't try to fit every feature into a "plugin" structure or an "actions" framework. We've streamlined the features based on our experience, and hopefully kept it fun to hack on (when you do need something extra) by avoiding meaningless abstractions.
Do one thing really, really well
A lot of content management systems try to do everything involved in running a web site. Database management, email management, form design, you name it. We don't want a content management system that takes over every aspect of making a web site. We've made conscious decisions to leave a lot of features out. In addition to the above (which can all be found in Zope + Plone, for example), we've spent a lot of time deciding how far certain features should reach and what should be left up to the webmaster.
One example is theme switching. When you're first building a web site, being able to download prepackaged themes might be nice. But for production sites, this simply does not happen. Imagine the Cleveland Museum of Natural History or a university department downloading new themes and swapping them out. Not gonna happen. Instead this is limiting, because prepackaged themes require predetermined markup. As a result, most Plone sites look the same and are structured the same way. They have the little tree on the left and those tiny tabs and a logo above that. And then you're scared to modify too much CSS because there's a bunch already dedicated to making those tabs pixel-perfect. We don't have a default theme or even default markup. Markup and design are meant for programmers and web designers, let's not pretend otherwise.
Use simple terminology
As Jeffrey Veen mentioned in Making A Better CMS, stop it with the jargon already! "Mambots", "archetypes", "portlets", and I'll admit it, I'm not even a fan of the term "widgets". We've tried to use understandable terminology throughout Pagoda and not extend failed analogies.
One example where we created a feature and spent some effort on choosing a name is Placeholders. This is a feature that we've actually needed on production sites but haven't found in other content management systems. The idea is that there is text that appears on multiple pages within the content, and it would be nice to only have to change in one place so we don't have to hunt down every page in the future. Phone numbers, store hours, admission prices, and press contact information are some examples. These aren't template variables because they have nothing to do with templates (to the user) and aren't arbitrary Python objects, and they're not code snippets because they have nothing to do with code. They're simply content placeholders. Here's the mockup we used while implementing this feature:

Borrow features that work
We've had a lot of inspiration along the way and used it to solve real problems. For example, if you need to have a downloadable file on your web site, a lot of content management systems will force you to ask "where do I put this?" and once you've decided on a place, require you to find your way there in the filesystem. We decided on pages having Attachments. Most downloads are associated with a particular page, so just upload them to that page and that will determine their location. We used 37signal's Campfire for inspiration, where people can upload files to the room they're in and they appear as attachments.
Reduce the number of clicks
We're lucky enough to have started developing after AJAX became popular. The "Web 2.0" buzzword might be annoying, but this is really something we can use to make content management quicker and easier. Navigation and messing around with page options won't require dozens of clicks and page reloads anymore. Instead of having to retrofit our software to take advantage of AJAX, we can design with it in mind.
Built a content management system, not a new framework
Similar to doing one thing really well, we're not building a web framework. That's what TurboGears is for. People can still use their existing TurboGears controllers, models, and templates. We're using SQLAlchemy for Pagoda's models and Genshi for the templates. To install Pagoda for your existing TurboGears project, you'll just have to subclass from PagodaController instead of the default RootController, so Pagoda can dispatch requests to the appropriate page.
So hopefully it sounds like an interesting project. We're still hacking on the core and hope to release a demo before the end of March, when we'll also invite people to help out and find weak spots. We have some mailing lists on Google Groups for discussion: pagoda-talk (general discussion), pagoda-coders (core development), and pagoda-announcements (for releases and other notices). For the first few releases we'll also make announcements on the TurboGears list.
A memorable PyCon moment
In one of the meeting rooms in the conference hotel, a dozen or so Djangonauts are quietly and productively hacking away on projects such as geodjango, a more decoupled admin, and Django snippets.
Ian Bicking pops his head in the door. In a loud whisper: "Hey is this the quiet room! Can I join the quiet room!" (It was, in fact, designated as The Quiet Room.) A crowd of other web folks burst into the room and it quickly becomes apparent that they were probably at the EWT party and not sprinting.
There is some uproar as Ian demonstrates the next generation of Python packaging. The crowd leaves shortly after, causing mischief elsewhere in the hotel.
James Bennett leaves the room to get a drink, returning with the following prediction (paraphrased): "It seems that Django and Zope are the only web frameworks whose members aren't going to get arrested."
(Gary sums up some other moments nicely in PyCon 2007: The Untold Stories.)
CAS 1.0 Authentication for Django, Part 2
After using my Django CAS authentication module for a while, I decided to make a couple improvements.
The biggest improvement is that instead of modifying code in the CAS module itself to set your CAS address and do things like custom User field population, all this stuff can now be configured in your settings file.
Another improvement is that CAS authentication now works for the bundled admin interface. Since the administration interface does not account for an authentication backend that doesn't know the user's password, this makes the login form useless. The CAS module will now intercept requests to the administration interface and do the proper authentication routine if necessary, never showing the login form (which doesn't make sense for CAS). Intercepting requests, you ask? Yes, that means the CAS module is now middleware. Actually it's middleware, a couple views, and an authentication backend.
So here's how to use it now...
Extract it in django/contrib/. The code will be located at django/contrib/cas/. Is this a valid place to install third-party middleware? It's not really clear. Just do it anyway.
Now add it to the middleware and authentication backends in your settings. Make sure you also have the authentication middleware installed. Here's what mine looks like:
MIDDLEWARE_CLASSES = (
'django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.cas.middleware.CASMiddleware',
'django.middleware.doc.XViewMiddleware',
)
AUTHENTICATION_BACKENDS = (
'django.contrib.cas.backend.CASBackend',
)
You can now configure the CAS module in the same settings file. Here are the possible options, most of which can be safely ignored:
CAS_SERVICE_URL: This is the only setting you must explicitly define. Set it to the base URL of your CAS source.CAS_POPULATE_USER: A callable or the location of a callable. When a user logs in and is missing name and email attributes in the database, this will be called with their User model instance. Default is None (do nothing).CAS_ADMIN_PREFIX: The URL prefix of the Django administration site. If undefined, the CAS middleware will just check the view being rendered to see if it lives indjango.contrib.admin.views. The method is a little evil, but it works.CAS_LOGIN_URL: The URL where you bounddjango.contrib.cas.views.login. If undefined, assume/accounts/login/.CAS_LOGOUT_URL: The URL where you bounddjango.contrib.cas.views.logout. If undefined, assume/accounts/logout/.CAS_REDIRECT_URL: Where to send a user after logging in or out if there is no referrer and nonextpage set. Default is/.CAS_REDIRECT_FIELD_NAME: The name of the GET parameter in which to store the page URL to send the user to after logging in. Default isnext.
Need an example? Here's what my CAS settings look like:
CAS_SERVICE_URL = 'https://login.case.edu/cas/'
CAS_POPULATE_USER = 'present.utils.populate_user'
And the callable that lives at present.utils.populate_user (notice this code lives in my project instead of tinkering with the CAS module) looks like this:
def populate_user(user):
try:
ldap = LDAP()
person = ldap.filter_one_by(uid=user.username)
except:
if not user.email:
user.email = "%s@case.edu" % user.username
else:
# If it succeeds, update their User entry
user.email = person.mail[0]
user.first_name = fix_case(person.givenName[0])
user.last_name = fix_case(person.sn[0])
(LDAP and fix_case also live in my utils module).
Finally, make sure your project knows how to log users in and out by adding these to your URLconf:
(r'^accounts/login/$', 'django.contrib.cas.views.login'),
(r'^accounts/logout/$', 'django.contrib.cas.views.logout'),
Users should now be able to log into your site, and staff into the administration interface, using CAS 1.0.
Simple CAS 1.0 Authentication for Django
Back when I expressed interest in making the web presentation bounty based solely on client-side code, Simon (bounty master and Filer admin) expressed his wish to keep the two services decoupled (so I shouldn't rely on Filer for slideshow storage). While I still want to have a save-to-Filer feature, I decided that I should just go ahead and get the web presentation system up and running before worrying about a client-side-only version. So I started a Django project.
Anyway, the result is that I got CAS 1.0 working alongside the Django authentication system, which means I can take advantage of built-in features like permissions and messages with CAS-authenticated users.
If anyone else is interested in using CAS authentication with Django, you can download the code I'm using. Here's a brief usage guide:
- Set
SERVICE_URLincas/__init__.pyto the location of your CAS service. For example, Case's ishttps://login.case.edu/cas/. - Set
DEFAULT_REDIRECT_URLincas/__init__.py. Normally the user will be sent back to theirHTTP_REFERER(the page that requested login) after authentication. But if the user requests/accounts/login/directly (or there is noHTTP_REFERER), they will be sent toDEFAULT_REDIRECT_URL. - Enable the
loginandlogoutviews by adding these to your URLconf (customize the URLs if you want):(r'^accounts/login/$', 'your_site.cas.views.login'), (r'^accounts/logout/$', 'your_site.cas.views.logout'), - Add the backend in
settings.py:AUTHENTICATION_BACKENDS = ( 'your_site.cas.backends.CASBackend', ) - Make sure at least the following apps are installed:
INSTALLED_APPS = ( 'django.contrib.auth', 'django.contrib.sessions', 'your_site.cas', ) - Finally, if you have a way to populate the user's name and e-mail address fields from their username, put it in
cas/backends.py(see the comments). For example, I have LDAP code there.
P.S.: This just implements the minimum required for CAS authentication. Features like gateway, renew, and proxies are not supported.
An alpha version of the presentation system should be online to play with later this week.
Workshop: Making Databases Fun with Python
Reminder! This is today!
Did you ever notice how writing SQL is not very fun?
This Monday (November 20th) on behalf of Case Project Club, I will be hosting a workshop for those interested in Python and databases. The talk will be at 7:01 PM (sharp) until 8:30 PM in the Olin 303 classroom/computer lab. I'll have Python all set up for everyone to play with and follow along. Pizza and drinks will be provided!
Python is a powerful dynamic programming language suitable for many tasks, including data analysis for research, web programming, and just plain fun. Even if you don't know Python, there won't be any crazy wizardry going on during the worskhop, so you should be able to pick up the basics very quickly.
Some contents of the talk will include:
- Simple data/object persistence, for when SQL is overkill.
- The dbapi, a standardized interface for talking to databases with Python.
- An overview of object-relational mappers that will let you harness the power of relational databases without writing a single line of SQL (and easily swap out SQL backends).
- Construction of a database application during the workshop everyone can play with, made with Django's object-relational mapper (or perhaps SQLAlchemy).
Again, no prior knowledge of Python or any of the related libraries is required.
Hope to see you there!
Automating Case Wiki Tasks
A while ago Chris added a login method to the CAS module in CaseClasses. It returns a mechanize Browser object so that you can programmatically surf the web as if you had logged into CAS in a real web browser.
CaseClasses also has a Codes module that has the abbreviated codes for majors, departments, and buildings. I combined these two features to tackle the Building codes project on the Case Wiki.
P.S.: There is a MediaWiki API that would normally be used to do this kind of stuff, but according to Greg, editing is not fully functional yet.
Think you could add a lot to the wiki with some automated task? Here's how it was done.
First, you'll need mechanize and CaseClasses:
$ sudo easy_install mechanize
$ sudo easy_install http://opensource.case.edu/svn/CaseClasses/python/trunk
Now log into CAS with mechanize:
import Case
from getpass import getpass
username = 'bmb12'
password = getpass() # Enter a password without echoing
cas = Case.CAS()
browser = cas.login(username, password)
You can open any page with browser and interact with it as a logged in Case user. So let's go to the Case Wiki and log in:
browser.set_handle_robots(False)
browser.open("http://wiki.case.edu")
browser.follow_link(text_regex='Log In')
Editing can be done like so:
browser.open("http://wiki.case.edu/User:Brian.Beck")
browser.follow_link(text='Edit this page')
browser.select_form(name='editform')
browser['wpTextbox1'] += " Also, this guy sucks!"
browser.submit()
Automating the building code edits was done like so:
for code, name in Case.Codes.buildings.iteritems():
url = "http://wiki.case.edu/%s" % name.replace(' ', '_')
try:
browser.open(url)
except:
print "Didn't find %r." % name
else:
browser.follow_link(text='Edit this page')
browser.select_form(name='editform')
source = browser['wpTextbox1']
add_text = "The building code for %s is [[building code:=%s]].\r\n"
add_text %= (name, code)
if 'code:=' not in source:
insert_at = source.find('{{Building')
if insert_at != -1:
new_source = source[:insert_at] + add_text + source[insert_at:]
else:
new_source = source + add_text
browser['wpTextbox1'] = new_source
browser.submit()
print "Added building code for %r." % name
Happy automating!
Update: The same has now been done for the Street addresses project. Check out the discussion to see how.geopy 0.93 Released: distance, util, GeoNames
Finally released geopy 0.93, which contains the distance and util modules I previously mentioned, a GeoNames geocoder, and improvements to the Google geocoder in other formats. Updating the documentation was all that was holding it back, really.
You can now pass domain and resource arguments to the Google geocoder. To query the actual Google Maps interface (instead of their official HTTP geocoder), initialize like so:
g = geocoders.Google(resource='maps')
The JavaScript results tend to be the best for this resource, so change that as well:
g = geocoders.Google(resource='maps', output_format='js')
Finally, for geocoding addresses outside of the US, change the domain being queried:
g = geocoders.Google(domain='maps.google.co.uk', resource='maps', output_format='js')
As James Robinson brought up on the geopy mailing list, work is under way for accuracy support. This will let you determine how precise the geocoded result is for the given location. For example, is it only guaranteed to be the correct city? Street? Is it the exact address? I decided to release this version of geopy without completing this, because not much work is done so far (and we also want to normalize values across geocoders), and the distance module was a pretty big addition.
To upgrade:
sudo easy_install geopy
Miscellaneous School, Blog, Python Stuff
Today was perhaps the most easygoing day ever. My first class was cancelled, my second class ended 20 minutes early, and my last class only lasted for 20 minutes. Ah, education!
After getting some free food from the ACM / Women in EECS event, I finished my day by admiring this totally legit use of the Expression Wall (clink clink!):
Finally, an expression I can relate to!
A while ago I decided to try out some fancy log analyzers like Performancing Metrics and Google Analytics. Google Analytics seems to be better for checking out data about your users, while Performancing Metrics seems to be better for checking referrers and (surprisingly) search terms (well, Google's might be better, but Performancing Metrics is way easier to navigate).
One interesting thing these sites (and Blog@Case Stats) tell me is that start.case.edu is consistently my top external referrer. So it seems to send a lot of traffic my way. Go start!
Chris and I are working on the next version of dmath, mostly for speed and to deal with custom contexts. For example, the result of atan2(0, 0) should be indefinite, but in the math module it's 0 (presumably so that the function is continuous). But if someone wants it to be indefinite (by which I mean D('NaN')), they should be able to set that in their context. Oh yeah, one big improvement is that pow will allow Decimals to be raised to Decimal powers.
We're still trying to wrap our heads around some of the context stuff. For example, should all of our functions accept an optional context argument, like the sqrt, pow, and other methods in Decimal? If so, does every Decimal constructed within that function need to also be passed the context, even D(1)? This is stuff that will probably be obvious after some more browsing of decimal.py. We're also looking into doing things in pyrex once everything is known to be in working order. Need for speed, baby!
Did I ever mention that geopy trunk now has support for GeoNames, and may soon support Map24? Map24 has done a pretty good job of convoluting their JavaScript so that their free geocoder is only accessible via AJAX, but this is merely a speedbump and not a road block. It almost works (but not the version in trunk). Sadly, like Yahoo!'s, their Terms of Use state that their geocoding tools can only be used in combination with their Maps AJAX API. But hey, just because you can access their stuff from Python doesn't mean the developer isn't still using it legitimately (that is to say, to show locations on a Map24 map).
That's all I got!
dmath: Math routines for Python's arbitrary-precision Decimal type
Yesterday Chris and I spent all day writing math functions for Python's Decimal type. The result is our new dmath library, available on Google Code and the Cheese Shop under the MIT/X11 license.
Sparked by the routine for atan in my last post, I decided it wouldn't be too hard to go ahead and do the rest of the functions already offered by math and cmath. We now have acos, asin, atan, atan2, ceil, cos, cosh, degrees, e, exp, floor, golden_ratio, hypot, log, log10, pi, pow, radians, sign, sin, sinh, sqrt, tan, and tanh.
Check it out:
>>> from dmath import *
>>> from decimal import Decimal as D, getcontext
>>> getcontext().prec = 50
>>> asin(D(1))
Decimal("1.5707963267948966192313216916397514420985846996876")
>>> golden_ratio()
Decimal("1.6180339887498948482045868343656381177203091798058")
We're calling this release 0.9 because it just needs some testing and maybe some speed improvements, otherwise it's ready to use. There is currently some work being done in Python sandbox/trunk to convert the decimal module to C, and maybe they'll include fast versions of all these routines. But hey, you can use these right now!
Arbitrary precision is one of the coolest things in programming. We spent a lot of time in Mathematica, where if you ask it to tell you the precision, it says 'Infinity'. During our testing, we actually stumbled across a bug in Mathematica's ArcTan function! This page correctly states that ArcTan[-Infinity, y] should always be Pi (with the sign of y). However, Mathematica always returns 0. I sent a message with my findings to the Mathematica mailing list and Daniel Lichtblau of Wolfram Research confirmed that it is indeed a simple bug. ArcTan users, beware!
Anyway, enjoy dmath. Contributions are welcome, especially if you have any speed tips!
Geocoding and Python's decimal module
Python has an awesome decimal module for decimal floating point arithmetic. It has configurable precision and keeps track of significant digits and does some other neat stuff.
While I was adding the geopy distance module, I began to wonder if it would be worth the effort to switch everything in geopy over to use Decimals instead of floats. After checking out the decimal module (I had never used it before), I decided that I had nothing to lose, so I went for it...
I quickly ran into some snags when I realized that I'd have to code my own trigonometric functions for use with Decimals, since those that come with Python are for complex or floating point numbers. The decimal recipes page in the documentation has functions for sin and cos, but distance uses asin, acos, atan, and atan2. Don Peterson has a nice decimalfuncs module with most of these, but it's GPL (and would be an uncommon dependency) — geopy is MIT/X11. So I went ahead and started on these...
I decided it would be easiest to define asin and acos in terms of atan, and it turns out there is a (relatively) quickly converging algorithm for that. Here's what I came up with for a Decimal-compatible atan:
def atan(x):
if x == D('-Inf'):
return pi() / -2
elif x == 0:
return D(0)
elif x == D('Inf'):
return pi() / 2
if x < -1:
c = pi() / -2
x = 1 / x
elif x > 1:
c = pi() / 2
x = 1 / x
else:
c = 0
getcontext().prec += 2
x_squared = x ** 2
y = x_squared / (1 + x_squared)
y_over_x = y / x
i, lasts, s, coeff, num = D(0), 0, y_over_x, 1, y_over_x
while s != lasts:
lasts = s
i += 2
coeff *= i / (i + 1)
num *= y
s += num * coeff
if c:
s = c - s
getcontext().prec -= 2
return +s
It depends on the pi function from the decimal recipes page, which calculates pi to the currently configured precision.
Upon finishing this, Chris came home and I told him what I was doing. Immediately, he tried to talk me out of it, asserting that floating point was good enough for geocoding. I tried to counter by explaining all the floating point calculations being performed in distance, but in the end he won. I no longer think it would be a very important change to convert everything in geopy to use the Decimal type.
What finally convinced me was this quote from the Vincenty distance page I used for reference:
Vincenty’s formula is accurate to within 0.5mm, or 0.000015″ (!), on the ellipsoid being used.
0.000015 arcseconds is about 4.16667e-9 degrees. Well, if floating point is good to about 10 decimal places, I guess Chris wins this time...
Still, if anyone wants Decimal support in the future, maybe I'll just ask Don Peterson for permission to include decimalfuncs with geopy...
Update: On second thought, maybe I will just continue implementing my own trig functions for Decimals. Chris and I just spent a while investigating the precision of my atan vs. decimalfunc's, and mine seems to be faster and more precise.
geopy gets distance and util modules
If you check out geopy trunk right now you'll notice a few changes.
I introduced two modules: util and distance.
util now contains the parse_geo and arc_angle functions, and will grow more in the future.
distance is a bigger addition and contains helpful functions for calculating geodesic distances. I planned to add this eventually, but development was sparked by a request from Chris Mulligan.
There are two distance formulas: Great-circle (aka haversine, aka spherical law of cosines) distance and Vincenty distance.
Great-circle distance uses a spherical model of the earth, using the average great-circle radius of 6372.795 kilometers (this is configurable). This results in an error of up to about 0.5%.
Vincenty distance uses a more accurate ellipsoidal model of the earth. This is the default distance formula, and is thus aliased as distance.distance — so you can easily swap out distance formulas just by changing distance.distance at the top of your code. There are multiple popular ellipsoidal models, and which one will be the most accurate depends on where your points are located on the earth. geopy includes a few good models in the distance.ELLIPSOIDS dictionary:
# model major (km) minor (km) flattening
ELLIPSOIDS = {'WGS-84': (6378.137, 6356.7523142, 1 / 298.257223563),
'GRS-80': (6378.137, 6356.7523141, 1 / 298.257222101),
'Airy (1830)': (6377.563396, 6356.256909, 1 / 299.3249646),
'Intl 1924': (6378.388, 6356.911946, 1 / 297.0),
'Clarke (1880)': (6378.249145, 6356.51486955, 1 / 293.465),
'GRS-67': (6378.1600, 6356.774719, 1 / 298.25),
}
Here's an example usage of distance.distance:
>>> from geopy import distance
>>> import Case
>>> wiki = Case.Geocode.CaseWikiGeocoder()
>>> _, a = wiki.geocode('Wade')
>>> _, b = wiki.geocode('Fribley')
>>> distance.distance(a, b).kilometers
1.342250272726943
>>> distance.distance(a, b).miles
0.83403565192666562
Using Great-circle distance:
>>> distance.distance = distance.GreatCircleDistance
>>> distance.distance(a, b).miles
0.835175984734287
You can change the ellipsoid model used by the Vincenty formula like so:
>>> distance.VincentyDistance.ELLIPSOID = 'Intl 1924'
The above model name will automatically be retrieved from the ELLIPSOIDS dictionary. Alternatively, you can specify the model values directly:
>>> distance.VincentyDistance.ELLIPSOID = (6377., 6356., 1 / 297.)
Oh yeah, you can add distances too (for paths and such). Here's the distance from Fribley to Wade to Phi Kappa Theta:
>>> _, c = wiki.geocode('Phi Kappa Theta')
>>> (distance.distance(b, a) + distance.distance(a, c)).miles
1.0596624112817861
Also included in the distance module are functions for converting between length units (kilometers, miles, feet, nautical miles), and calculating a destination given a starting point, initial bearing, and distance.
This stuff is still just in trunk, no egg or updated documentation yet...
geopy: Now on Google Code, More Geocoders
I decided to try out the Google's Project Hosting feature for geopy. You can find the hosted page at code.google.com/p/geopy. So far it seems pretty sweet and very easy to administer.
I added a geocoder for Microsoft's Windows Live Local (powered by Virtual Earth) to the geocoders module. Sadly, they don't actually have a non-JavaScript geocoding API, so I had to reverse-engineer it.
Norman Khine and I have been investigating issues geocoding UK addresses with the Google Maps API. Due to contractual reasons, they can't offer geocoded addresses with their HTTP geocoder. So instead I again had to reverse-engineer their JavaScript to get it to work. The geocoded results aren't always accurate, but this is Google's problem and not geopy's.
I also tried to add a geocoder for MapQuest's OpenAPI. It is possible to get geocoded results over HTTP (although they don't tell you how, you have to look at their JavaScript or guess), but unfortunately they require you to parse the input location first. This is totally lame. You're telling me they can't parse the address into street, city, country for me? I didn't want to have to do this, but I now plan to add address parsing methods to geopy.
Making dp.SyntaxHighlighter for Python Not Suck
I've always wanted syntax highlighting for my Python code on the web. The geopy site, for example, is made much more readable with syntax-highlighted code. However, it's all done statically by using KDevelop's HTML Export feature.
The other night I decided to finally try out dp.SyntaxHighlighter, a JavaScript syntax highlighter. But unfortunately I discovered that its Python support is actually pretty ugly. Its Python example page isn't even valid Python! Type declarations? Not only that, but the test page demonstrates its parsing failures—the comment within a string is highlighted as a comment, not a string.
So I made some modifications. You can see the modified test page here. If you want to use it, check out my modified shBrushPython.js and my python.css, which you can modify to change the colors.
I took out all the highlighting of the builtins, exceptions, types, and modules. In practice I don't think that's very useful. The purpose of syntax highlighting is to make code clearer, not to classify every little token and make it look like a rainbow.
One thing I was not able to do was highlight class names and function names where they are defined without also highlighting the "def" and "class" keywords. I think this makes things look a lot nicer (as you can see in the highlighted geocoders.py code), but unfortunately JavaScript does not support lookbehind assertions, so I don't think it's possible without modifying dp.SyntaxHighlighter a bunch.
I also really like having operators highlighted (see here again), but that slowed it down considerably. Uncomment that line in shBrushPython.js if you want it.
Enjoy.
geopy: A Geocoding Toolbox for Python
I just uploaded the first release of geopy to the Python Cheese Shop.
The web site (with a little documentation) is located at exogen.case.edu/projects/geopy/.
There are five geocoders: Google, Yahoo!, geocoder.us, MediaWiki, and Semantic MediaWiki. Usage examples are given on the web site.
I need to read up more on setuptools to make my package maintenance life easier. For instance, telling the Cheese Shop to install my package from Subversion would be nice right now, but I'm not sure how.
Update: You can now view the pretty syntax-highlighted source code on the web site. Thanks to the wonderful KDevelop for that ability!
Geocoding Tools for Python (and CaseClasses)
I've been working on some geocoding classes in Python. Right now I've got tools made for MediaWiki, Semantic MediaWiki, and the Google geocoder. I plan to include the Yahoo! geocoder in this toolbox soon.
I think this could be a useful package, so I plan to upload it to the Cheese Shop soon.
I added the relevant geocoder classes to the Python CaseClasses so that developers can easily geocode strings using Case's Semantic MediaWiki.
Check it out. First grab the geocoder:
>>> from Case import Geocode
>>> wiki = Geocode.CaseWikiGeocoder()
Then start geocoding:
>>> place, (lat, lng) = wiki.geocode('KSL')
Fetching http://wiki.case.edu/KSL...
>>> print "%s: %.5f, %.5f" % (place, lat, lng)
Kelvin_Smith_Library: 41.50727, -81.60950
geocode returns a tuple consisting of the location name found and the coordinates (another tuple).
Here's where the Semantic part comes in. The Project Club article isn't geocoded, but it is located in Olin, which is geocoded:
>>> place, (lat, lng) = wiki.geocode('Project Club')
Fetching http://wiki.case.edu/Project_Club...
Fetching http://wiki.case.edu/index.php/Special:ExportRDF/Project_Club?xmlmime=rdf...
Fetching http://wiki.case.edu/index.php/Olin_Building...
>>> print "%s: %.5f, %.5f" % (place, lat, lng)
Olin_Building: 41.50224, -81.60778
CaseWikiGeocoder is a subclass of SemanticMediaWikiGeocoder and is defined by only the following:
class CaseWikiGeocoder(SemanticMediaWikiGeocoder):
def __init__(self):
super(CaseWikiGeocoder, self).__init__("http://wiki.case.edu/%s",
relations=['Located in'])
This creates a SemanticMediaWikiGeocoder with a base URL of 'http://wiki.case.edu/' that follows the 'Located in' relation if a page fails to geocode. So SemanticMediaWikiGeocoder could easily be used for any Semantic MediaWiki with any set of relationships defined. This class is brand new and has only been tested on the Case Wiki, so it might be buggy.
MediaWikiGeocoder relies on BeautifulSoup since it assumes wiki pages can be malformed.
Remember, if you have easy_install, you can simply type this to install CaseClasses:
sudo easy_install http://opensource.case.edu/svn/CaseClasses/python/trunk
After this geocoding toolbox is complete, I'll see if I can make that Case geocoder web service we talked about on the forum.
Update: Per Greg's comment, support for reading coordinates from semantic attributes is now in Case.Geocode (along with some other small improvements).
CaseWikiGeocoder is now defined as:
class CaseWikiGeocoder(SemanticMediaWikiGeocoder):
def __init__(self):
base = super(CaseWikiGeocoder, self)
base.__init__("http://wiki.case.edu/%s",
attributes=['Geographical coordinate'],
relations=['Located in'], prefer_semantic=True)
Merquery Summer of Code Results
This morning I commited a working version of Merquery to the Django Subversion repository.
svn co http://code.djangoproject.com/svn/django/branches/search-api/
My code in particular lives in branches/search-api/django/contrib/search.
The Lucene adapter is fully functional (but needs some more convenience functions), and the Xapian and Hyper Estraier adapters need a little more work.
Here's an example of using the Lucene adapter, using a similar model to my old Merquery post:
from django.db import models
from django.contrib.search.backends import LuceneIndexer
class Person(models.Model):
first_name = models.CharField(maxlength=30)
last_name = models.CharField(maxlength=30)
biography = models.TextField()
indexer = LuceneIndexer('/tmp/person-index', Person,
fields=['Person.biography'],
attributes={'first': 'Person.first_name',
'last': 'Person.last_name'})
As you can see, you specify an index location (database locations should be supported in the future), which model should be considered the document (hit results), and which fields to index.
It also allows shorthand for the fields:
indexer = LuceneIndexer('/tmp/person-index', Person, 'Person.biography',
first='Person.first_name', last='Person.last_name')
Okay, let's insert some people...
b = Person(first_name='Brian', last_name='Beck', biography='Python advocate')
g = Person(first_name='Guido', last_name='van Rossum', biography='Python creator')
s = Person(first_name='Spiros', last_name='Eliopoulos', biography='Loves Haskell')
And force all the Person objects to be indexed...
indexer.update()
You can also send update a list of Person objects to update (beware of Lucene's update inserting duplicates for now).
Finally, Lucene's sweet query syntax is available for your database:
>>> for hit in indexer.search('python'):
... print hit, hit.instance
<LuceneHit: merquery.person 1, Score: 0.625> <Person: Brian Beck>
<LuceneHit: merquery.person 2, Score: 0.625> <Person: Guido van Rossum>
>>> for hit in indexer.search('python creator'):
... print hit, hit.instance
<LuceneHit: merquery.person 2, Score: 1.0> <Person: Guido van Rossum>
<LuceneHit: merquery.person 1, Score: 0.168048456311> <Person: Brian Beck>
>>> for hit in indexer.search('last:Beck OR first:Spiros'):
... print hit, hit.instance
<LuceneHit: merquery.person 1, Score: 0.496906995773> <Person: Brian Beck>
<LuceneHit: merquery.person 3, Score: 0.496906995773> <Person: Spiros Eliopoulos>
There are some things that should be changed before Merquery is production-ready, and some things that would be nice in the long-term.
One is that the indexer knows how to follow ForeignKey fields (maybe 'Person.address.street_name', for example), but there needs to be a couple changes to get them to work—this is a result of a Field instance seemingly not having an attribute linking back to the Model it is bound to. Support for ManyToMany joins also needs to be thought out.
Another is the way you pass Model instances to specify as the document return type (the type returned by hit.instance). Documents should be able to aggregate many Model instances and not have to consider any particular Model the document type. So a Document class will probably be introduced that acts like the Model metaclass, letting you 'build' a Document prototype and telling the indexer how all of its attributes should be retrieved and treated.
Automatically knowing when to update the index would also be very nice.
There are a few main long-term goals...
One is to make a universal Merquery query language that will automatically translate queries into the backend's query syntax. This would especially good for Hyper Estraier, which has whacked-out attribute-search syntax:
@first STRINC Brian
Another is to make some Models to keep track of indexing status and query statistics, and offer nice admin views of these.
Finally, storing all index data in a database (especially a Model-compatible one) instead of on the filesystem would be great.
Since I have a week before classes start, I'm going to continue making the necessary changes to make Merquery production-ready.
Mapping Crimes on Campus
So a couple entries ago I mentioned that someone should make something like chicagocrime.org for Case. This is possible since we have a daily crime log which lists the location of each incident, and an excellent wiki with many geocoded locations.
I was bored a few nights ago and decided to try it out. In the interest of releasing early and often, I've made my first test available online at exogen.case.edu/crime/recent.
As you can see, it's nothing too fancy yet. This is my first non-trivial Django application, and I plan on using it as a testbed for Merquery.
So here's my to-do list (when I'm not working on Merquery, of course):
- Draw new markers.
- What should happen when two markers are placed on the same location, which is likely? Show both in the info window? Change the color of the marker? Try to put them next to each other?
- Offer views by type of violation.
- Activate the better location parsing (right now only the wiki data is checked, but the Google geocoding API helps with addresses). Already written, just needs to be put into production...
- Geocode all the locations that couldn't be found.
- Offer an RSS feed of recent crimes.
- Import all incidents back to 2000 (for fun).
- Use all the imported data for crime statistics per area.
- Put code on opensource.case.edu.
Now to watch for that bicycle thief...
Update: Problem seems to be fixed, let me know if it's broken for you.
Update: Put the new location parser into production, and it now has a 66% success rate for geocoding locations. More wiki entries would help that number... anything to avoid doing them manually.
Brian's TurboGears Tutorial
This weekend I decided to write my own TurboGears tutorial instead of working on Merquery. If
