Entries for December 2006

Winter and Spring Plans

I'm in Rhode Island until mid-January and will be keeping myself busy with a few projects...

  • Haven't graduated yet, but as I may have mentioned before, I'll be spending the Spring and indefinite future working on a startup with Chris (Hesse) and Ian (Charnas). I'll be taking a couple classes at Case next Summer semester. You'll probably still see me lurking around campus until then...
  • Gotta finish up that web presentation system for Simon. There's plenty of time to work on it here so it should be online for testing any day now.
  • Got some outstanding patches and bug reports to check out for geopy, dmath, CAS for Django, and maybe some stuff I'm forgetting about? Case's very own Gary Bernhardt posted a geopy patch just yesterday. Thanks, Gary!
  • Gonna try to finish up the Case Geocoder before I get back to Cleveland. I'm currently refactoring it to allow for easy extensibility of its parsing and geocoding modules so that others can easily contribute strategies to increase its accuracy without having to tear up too much code. So we'll see how that goes (trying not to overengineer it)...
  • The Campus Crime Map will be offline probably until the Case Geocoder is in place.
  • Gotta catch up on the stuff Ian has been working on for our startup.

Possible next posts: Brian's thoughts on grades and employment, geopy/dmath/etc. updates, new blog/site design...

The Way of the Samurai

Today Justin Rich sent me this explanation of the Samurai Principle (I later realized he probably found it on programming.reddit.com):

Return victorious, or not at all. A [software development guideline meaning] you should either complete your contract and return a valid result, or throw an exception.

This is indeed commonly regarded as the correct way to do things in Python. In the standard library, there are only two cases off the top of my head that could be interpreted as violating this guideline: str.find, which returns -1 if no matching substring is found, and re.match, which returns None if there is no match.

I say "could be interpreted" because both of these functions perform a search. In the case of searching, what to do in the case of no search result is a judgement call. Do you take the Samurai approach, like str.index (which raises an exception), or do you return None or -1 like re.match and str.find? (I think None is clearly preferred among Python programmers if they go that route.)

Normally I would have just looked at this article, said "Huh." and went about my business, but this actually held my interest because of the decisions I made in geopy.

When you geocode a string in geopy, the result is of the form:

location_name, (latitude, longitude)

That is, a tuple containing the coordinate (as a tuple) and the canonical location name returned by the geocoder.

If the geocoder fails to geocode the given string, it returns:

None, (None, None)

Clearly the Samurai Principle is not at work here. But the article made me think: for better or worse?

I'll explain why I chose to do it this way, and maybe some programmers out there will share their opinions.

Geocoding is indeed a form of searching. When you geocode a string, you're asking "do you know where this location is?" just as much as you're asking "where exactly is this location?" This is why usage of the Samurai Principle is a judgement call (how exactly you are defining the function's contract) in the case of searching. The answer "no, I didn't find that" is a perfectly valid response to the first question, despite perhaps a "not found" exception being a more appropriate response to the second.

Furthermore, I wanted to support incomplete results in geopy, which I think are better than nothing. For example, if the geocoder knows only the latitude or longitude (but not both), or if it knows just the canonical name of the location but not where it is, or if it knows the location but not a better name than what you gave it. In other words, all of these are possible return results:

canonical_name, (None, None)
None, (latitude, None)
None, (None, longitude)
None, (latitude, longitude)

...and so on. It seems natural, given the above, that the "not found at all" response of None, (None, None) should not instead raise an exception.

Now, if there was a failure in the geocoder backend (such as their server being down), then I think an exception is appropriate.

That's all I wanted to say really, just thinking out loud...

Completely unrelated post-script: Remember a couple Halloweens ago when Patty and I entered the Achewood costume contest? I don't think I ever posted a follow-up, but I actually came in second place and won a signed Achewood book. I'm just posting this because I was bored and starting clicking links on the Achewood home page...

Excited About PyCon 2007

So a list of accepted PyCon 2007 talks is available and it looks awesome. I was getting excited over it while at the same time being disappointed that I probably won't be able to afford attending. On the other hand, by then I'll have graduated (...probably) and who knows, maybe I'll have the time and money. Ian, Chris and I will be building up our Python shop at that point; maybe it would help for us to make an appearance, in addition to being totally fun?

Anyway, looking at that list makes me feel all warm and fuzzy about the Python community. You can even see Clepy's own Mike Pirnat in the fourth-from-left picture at the top!

CAS 1.0 Authentication for Django, Part 2

After using my Django CAS authentication module for a while, I decided to make a couple improvements.

The biggest improvement is that instead of modifying code in the CAS module itself to set your CAS address and do things like custom User field population, all this stuff can now be configured in your settings file.

Another improvement is that CAS authentication now works for the bundled admin interface. Since the administration interface does not account for an authentication backend that doesn't know the user's password, this makes the login form useless. The CAS module will now intercept requests to the administration interface and do the proper authentication routine if necessary, never showing the login form (which doesn't make sense for CAS). Intercepting requests, you ask? Yes, that means the CAS module is now middleware. Actually it's middleware, a couple views, and an authentication backend.

So here's how to use it now...

Get cas_middleware.tar.gz.

Extract it in django/contrib/. The code will be located at django/contrib/cas/. Is this a valid place to install third-party middleware? It's not really clear. Just do it anyway.

Now add it to the middleware and authentication backends in your settings. Make sure you also have the authentication middleware installed. Here's what mine looks like:

MIDDLEWARE_CLASSES = (
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.cas.middleware.CASMiddleware',
    'django.middleware.doc.XViewMiddleware',
)

AUTHENTICATION_BACKENDS = (
    'django.contrib.cas.backend.CASBackend',
)

You can now configure the CAS module in the same settings file. Here are the possible options, most of which can be safely ignored:

  • CAS_SERVICE_URL: This is the only setting you must explicitly define. Set it to the base URL of your CAS source.
  • CAS_POPULATE_USER: A callable or the location of a callable. When a user logs in and is missing name and email attributes in the database, this will be called with their User model instance. Default is None (do nothing).
  • CAS_ADMIN_PREFIX: The URL prefix of the Django administration site. If undefined, the CAS middleware will just check the view being rendered to see if it lives in django.contrib.admin.views. The method is a little evil, but it works.
  • CAS_LOGIN_URL: The URL where you bound django.contrib.cas.views.login. If undefined, assume /accounts/login/.
  • CAS_LOGOUT_URL: The URL where you bound django.contrib.cas.views.logout. If undefined, assume /accounts/logout/.
  • CAS_REDIRECT_URL: Where to send a user after logging in or out if there is no referrer and no next page set. Default is /.
  • CAS_REDIRECT_FIELD_NAME: The name of the GET parameter in which to store the page URL to send the user to after logging in. Default is next.

Need an example? Here's what my CAS settings look like:

CAS_SERVICE_URL = 'https://login.case.edu/cas/'
CAS_POPULATE_USER = 'present.utils.populate_user'

And the callable that lives at present.utils.populate_user (notice this code lives in my project instead of tinkering with the CAS module) looks like this:

def populate_user(user):
    try:
        ldap = LDAP()
        person = ldap.filter_one_by(uid=user.username)
    except:
        if not user.email:
            user.email = "%s@case.edu" % user.username
    else:
        # If it succeeds, update their User entry
        user.email = person.mail[0]
        user.first_name = fix_case(person.givenName[0])
        user.last_name = fix_case(person.sn[0])

(LDAP and fix_case also live in my utils module).

Finally, make sure your project knows how to log users in and out by adding these to your URLconf:

(r'^accounts/login/$', 'django.contrib.cas.views.login'),
(r'^accounts/logout/$', 'django.contrib.cas.views.logout'),

Users should now be able to log into your site, and staff into the administration interface, using CAS 1.0.