The Way of the Samurai
posted by brian at 12:59 AM
Today Justin Rich sent me this explanation of the Samurai Principle (I later realized he probably found it on programming.reddit.com):
Return victorious, or not at all. A [software development guideline meaning] you should either complete your contract and return a valid result, or throw an exception.
This is indeed commonly regarded as the correct way to do things in Python. In the standard library, there are only two cases off the top of my head that could be interpreted as violating this guideline: str.find, which returns -1 if no matching substring is found, and re.match, which returns None if there is no match.
I say "could be interpreted" because both of these functions perform a search. In the case of searching, what to do in the case of no search result is a judgement call. Do you take the Samurai approach, like str.index (which raises an exception), or do you return None or -1 like re.match and str.find? (I think None is clearly preferred among Python programmers if they go that route.)
Normally I would have just looked at this article, said "Huh." and went about my business, but this actually held my interest because of the decisions I made in geopy.
When you geocode a string in geopy, the result is of the form:
location_name, (latitude, longitude)
That is, a tuple containing the coordinate (as a tuple) and the canonical location name returned by the geocoder.
If the geocoder fails to geocode the given string, it returns:
None, (None, None)
Clearly the Samurai Principle is not at work here. But the article made me think: for better or worse?
I'll explain why I chose to do it this way, and maybe some programmers out there will share their opinions.
Geocoding is indeed a form of searching. When you geocode a string, you're asking "do you know where this location is?" just as much as you're asking "where exactly is this location?" This is why usage of the Samurai Principle is a judgement call (how exactly you are defining the function's contract) in the case of searching. The answer "no, I didn't find that" is a perfectly valid response to the first question, despite perhaps a "not found" exception being a more appropriate response to the second.
Furthermore, I wanted to support incomplete results in geopy, which I think are better than nothing. For example, if the geocoder knows only the latitude or longitude (but not both), or if it knows just the canonical name of the location but not where it is, or if it knows the location but not a better name than what you gave it. In other words, all of these are possible return results:
canonical_name, (None, None)
None, (latitude, None)
None, (None, longitude)
None, (latitude, longitude)
...and so on. It seems natural, given the above, that the "not found at all" response of None, (None, None) should not instead raise an exception.
Now, if there was a failure in the geocoder backend (such as their server being down), then I think an exception is appropriate.
That's all I wanted to say really, just thinking out loud...
Completely unrelated post-script: Remember a couple Halloweens ago when Patty and I entered the Achewood costume contest? I don't think I ever posted a follow-up, but I actually came in second place and won a signed Achewood book. I'm just posting this because I was bored and starting clicking links on the Achewood home page...
Comments
I think that, as long as you define your contract to include the possibility of returning None, that your approach is perfectly valid and compatible with the Samurai principle.
Also, you win the record for "Least Contiguous Paragraphs I've Ever Read in a Blog Entry, And That's Saying Something".
And your costume was excellent.
Ruby on Rails has a similar problem to this in the find method.
In the find method for a class you can find objects in the database two ways. The first is by Class.find(database_ids*) which raises an exception if no row with id = database_id exists. The other way is by doing something like Class.find(:first, :conditions => ["id =?", database_id]). With the latter method no exception is raised and NIL is returned. Both generate the exact same SQL for the database.
First hand experience tells me that either way has its ups and downs, where no matter what the code has to do some checking. They both have the same amount of code. And the programmer still has to know his libraries' policies.
Personally I like the exception better. If the exception is named correctly a programmer looking at the code for the first time can have a good idea of why the Class::ObjectNotFound exception is raised/caught compared to finding the reasons the code checks for an empty object.
I would really just like to see some consistency within the Rails library.
I think you also need to think about what your results "mean" -
as you point out, if the location is found, but doesn't know the lat/lon, then it's not that there is "No" lat/lon, it's just "unknown" to the geocoder.
But in a case where there really is no canonical name - then "None" is appropriate.
So there are three cases illustrated - and their results should really indiciate which case occurs:
1) Data is unknown
2) Result is known to have No Answer
3) An error occured