Python Web Framework Experiments, Day 1
posted by brian at 04:29 PM
Last night (aka this morning) I wrote that I would venture into the realm of making my own Python web framework, without any restrictions on making a complete (as in you-should-use-this-for-your-business) or Pythonic solution. As promised, here's my first attempt at some URL mapping mechanism. But first...
Thanks to Ian Bicking for pointing out Routes. It gave me that slight flutter of "ooh, what if someone's already done that?" For better or worse, it is quite different (and quite nice).
Adrian also corrected me on the origins of Django's components. I actually have spent some time looking at Django, and even read Adrian's presentation, which must be even better with sound & video. So either I just didn't comprehend the history of Django, or it isn't immediately obvious from looking at it that it is a single unit (I think a little of both, yes that is a slight jab, sorry).
Onto some code...
Some things to keep in mind. This is only my first attempt at this particular problem domain; as I said, I'm prepared to experiment. This was the first thing that came to mind and I was pleased when I got a working prototype in about 20 minutes.
Common Problem 1: URL Mapping
Some problems I listed with current URL mapping solutions the other day are regular expression usage, big jumps in complexity for deep URLs, and lots of space between where URLs are defined and where their respective methods are defined.
Hacky Solution 1: Division Overloading for Path Components (a la path), Which Can be Types!
This offers simple validation, keeps the URL and its corresponding method right next to each other (it's silly, but that's what I like the most), makes complex situations a bit easier, and you can call your methods and classes whatever you want.
Simple examples, assuming SQLObject classes for our model:
Edit: This may be painful to read without syntax highlighting, so here are some similar examples in a .py file.
class Blog:
# Example: /index
@url('index')
def homePage(self):
return Entry.select(orderBy='-created')[:10]
entries = Path('entries')
# Example: /entries?query=test&format=rss
@url(entries, query=str, format=oneOf('html', 'rss'))
def findEntries(self, query=None, format='html'):
if query:
return Entry.search(body=query)
else:
return Entry.select()
# Example: /entries/5
@url(entries/Entry.id)
def showEntry(self, entry):
print "Retrieving entry", entry.id
return entry
Okay, there's something neat. That entry argument in showEntry won't be the entry ID from the URL, it will be the entry itself with that ID! When a URL component is an SQLObject alternateID field, the corresponding row is automatically retrieved. Let's go back to that findEntries example now that we know it plays nice with SQLObject...
@url(entries, query=Entry.body)
def findEntries(self, query):
for entry in query:
print "Found entry: %s." % entry.title
What's going on here? Entry.body isn't an alternateID, so it just uses selectBy instead, so query will be a SelectResults sequence. However, this is NOT the same as before, since it will only return exact matches, and there's no way to get the actual string parameter that was requested. Still, you can imagine this working well for, say, a phone book, where you only want to find exact first or last name matches. Continuing on...
# Example: /entries/5/comments?format=rss
@url(entries/Entry.id/'comments', format=oneOf('html', 'rss'))
def listComments(self, entry, format='html'):
return entry.comments
# Example: /entries/5/comments/10
@url(entries/Entry.id/'comments'/int)
def showComment(self, entry, commentIndex):
# Show an individual comment, maybe for thread view?
return entry.comments[commentIndex]
As you can see, unknowns in the URL's path are mapped to positional arguments, while parameters are mapped to keyword arguments. Check out the use of the actual int type as a URL component in that last example. Any type will work, as long as you trust its conversion from string to be an accurate representation (bool('False') is a good example of where this fails).
Now for something a bit more complex, like file browsing in arbitrarily deep folders.
static = Path('static')
@url(static/[str]/str)
def staticFile(self, subdirs, filename):
import os
fullPath = os.abspath('/'.join(subdirs + [filename]))
if not os.path.isdir(fullPath):
return file(fullPath).read()
# Example: /static/images/anim/glow.gif
# -> self.staticFile(['images', 'anim'], 'glow.gif')
Archives are often browsed by date, where the URL components can specify a varying date by year, month, and day. So we want to support anywhere from 0 to 3 consecutive integer arguments (0 showing the whole archive, 1 being the year, etc.)
@url(entries/'archive'/[int])
def listByDate(self, date):
date = date[:3]
date.extend([None] * (3 - len(date))) # Pad to length 3 with None
year, month, day = date
return Entry.selectBy(year=year, month=month, day=day)
Enclosing a component in a list means that it will match any number of that component, and the positional argument will be returned as a list. This is starting to look a bit like regular expressions, except prettier (to me, at least), and the values are converted to their respective types.
There are more things, like using tuples for enumerated values (oneOf is actually just for readability), and possibly using | like in regular expressions, but you get the idea.
This fits my brain nicely: "at this URL, return this function..."
Issues, besides the obvious: there's some repeating going on, but I think this happens in other systems with similar features. For example, you obviously have to retype the path prefixes every time if you're using regular expressions, and if you're not, there's the (typing) overhead of nesting classes (or at least making a bunch of them). And if you include validation, that's more re-typing of parameters (telling the validator which parameter should use which validator).
What about using multiple URLs for a function and vice versa? Well, the neat thing is that the url decorator sets a url property on the method itself, with all the relevant information. Redirecting is made easier with a custom Redirect exception (like in CherryPy), just give it the method, it can determine what URL to redirect to from there:
@url(entries/'new'):
def newEntryForm(self):
return "<form>...</form>"
@url(entries/'save', title=str, body=str)
def makeEntry(self, title, body):
entry = Entry(title=title, body=body)
raise Redirect(showEntry, entry)
Right now this "simple" solution would actually work for all my current projects. I'm guessing anyone interested in this can probably make a good guess at how it's implemented, so I'll only post about that if someone asks. URL dispatching works fine (this isn't just a make-it-work-like-this proposal); I have a very simple function that takes a URL just like what a web server would receive and calls the appropriate method.
So, web people: could this technique actually survive? What would need to be changed?
Comments
Overriding the division operator is quite clever indeed!
Hi Brian!
If Routes (routes.groovie.org) is anywhere close to what you're looking for, you might want to give it a spin. :)
The next feature for it, which I remarked on my blog about when announcing it takes care of the case you're referring to. When you want to map a URL not divided solely by //'s.
So hooking up a route would be,
m.connect('category_home', 'archives/:category/:(page).html', controller='category', action='view')
etc.
That way specific sections could be easily extracted and used as variables.
Ben,
I do like the look of Routes, my only complaint being that I still have to look at every usage for a while before I fully understand what's going where. But I think since your goal may be to make the least amount of assumptions about the rest of the tools the programmer is using, this may be okay.
I think the automatic SQLObject row retrieval I demonstrated in my examples makes a good case for extending current URL mapping schemes to allow extensions to their functionality. For instance, Routes could be extended to understand SQLObject fields in its requirements dictionary, and do the same thing as mine.
This might also have to do with your more generic approach, but as I've said I'm not a big fan of the connecting (m.connect) happening very far from what will happen when the connected URL is accessed. Maybe with Routes it's possible to put an m.connect next to the controller it's connected too, I can't tell if they all have to be in one place. As ugly as decorators may be, how would you feel about offering a @connect decorator, with the controller (or whatever else) being inferred from the decorated method?
Reading your Integrating page, I see now that you'll probably respond that both of my suggestions above would be up to the author of the framework making use of Routes, which I don't think is unreasonable.
Ah, I see. You're also operating under the assumption that one and only one URL will map to a method/action. Yes, you could put the connect statements in as decorators and have those hook up the route to the Mapper.
The nice thing I rather enjoy, is that I never have to add anything to any of my methods, as the Routes semi-programmatic approach means I can map in easy access to sets of actions by controller.
The Controller in this context merely being a convenient place to group related actions together. While the example I showed hardcodes an action, you can put the :action as in the route path and avoid having to individually decorate dozens or hundreds of actions/methods.
If you have any questions or comments, feel free to email me, its great getting feedback. :)
Well, no. You can stack decorators, and even the same decorator on top of itself.
This works:
@url('index')
@url('home')
@url('')
def frontPage(...):
It's definitely a different approach than what I made. With Routes there is a very defined "URL mapping" component, which some people will enjoy. I tend to like as few components as possible, "less to worry about," because I'm not always thinking "how will changing this thing affect that other component off someplace else?" A decorator approach keeps related things next to each other without making a tight coupling (removing the decorator won't break the function).
Give it a spin in a real web application, that's always the best way to test. If you take a look around, I can't think of any other method which requires this much work to map a URL to a method. There's likely a good reason for that.
Django maps in multiple URL's via a regexp to batches of functions, TurboGears uses CherryPy's object approach to avoid all the work, etc. Not a single in-use framework requires so much effort as a decorator for every URL.
Imagine if you had 120 actions (very realistic), having to define every single mapping would become a massive chore. Especially if you want two or three ways to load each method...
This is just my guess on why all the frameworks don't use something like this, maybe they missed something? Give it a spin, let us know. ;)
I'm curious why you think this is so much more work when the amount of work is equivalent to Django, just expressing it differently. One decorator per URL vs. one regexp per URL, how is either any more or less work? Just because the decorators are next to the function and the regexps are all together? It actually does the same thing: the decorator turns the path you gave it into a regular expression and registers it in a list, just like what you would manually do in Django. The differences being that you don't have to write regular expressions and you can see the URL and what the URL *does* at the same time.
You're right, I hadn't realized that each regexp in Django mapped to a single method. Ugh, I'm not such a fan of that. Means you have to do extra logic inside a method to compensate for what CherryPy or Routes would've done for you.
And if some code can replace a bunch of manual typing... I don't see why you wouldn't want to use it. Less boiler-plate is good in my opinion, so anytime I come across something that requires significantly more work on my part, I try and see if there's some way to remove it through defaults or programming.
For example, a common function with a 'edit' screen which can also handle having the form submitted back to itself, will be to check if the request is a POST or not. In my system, I can ensure such a check merely be calling the method edit_POST and having a method edit. If the request is a POST, the framework automatically goes to a action_POST method if it exists, otherwise it falls through to the action method.
It seems like a matter of personal preference to a point, though the Routes/CherryPy/Zope system reduces the amount of work you do, and puts it into the framework.