Geocoding Tools for Python (and CaseClasses)
posted by brian at 06:30 PM
I've been working on some geocoding classes in Python. Right now I've got tools made for MediaWiki, Semantic MediaWiki, and the Google geocoder. I plan to include the Yahoo! geocoder in this toolbox soon.
I think this could be a useful package, so I plan to upload it to the Cheese Shop soon.
I added the relevant geocoder classes to the Python CaseClasses so that developers can easily geocode strings using Case's Semantic MediaWiki.
Check it out. First grab the geocoder:
>>> from Case import Geocode
>>> wiki = Geocode.CaseWikiGeocoder()
Then start geocoding:
>>> place, (lat, lng) = wiki.geocode('KSL')
Fetching http://wiki.case.edu/KSL...
>>> print "%s: %.5f, %.5f" % (place, lat, lng)
Kelvin_Smith_Library: 41.50727, -81.60950
geocode returns a tuple consisting of the location name found and the coordinates (another tuple).
Here's where the Semantic part comes in. The Project Club article isn't geocoded, but it is located in Olin, which is geocoded:
>>> place, (lat, lng) = wiki.geocode('Project Club')
Fetching http://wiki.case.edu/Project_Club...
Fetching http://wiki.case.edu/index.php/Special:ExportRDF/Project_Club?xmlmime=rdf...
Fetching http://wiki.case.edu/index.php/Olin_Building...
>>> print "%s: %.5f, %.5f" % (place, lat, lng)
Olin_Building: 41.50224, -81.60778
CaseWikiGeocoder is a subclass of SemanticMediaWikiGeocoder and is defined by only the following:
class CaseWikiGeocoder(SemanticMediaWikiGeocoder):
def __init__(self):
super(CaseWikiGeocoder, self).__init__("http://wiki.case.edu/%s",
relations=['Located in'])
This creates a SemanticMediaWikiGeocoder with a base URL of 'http://wiki.case.edu/' that follows the 'Located in' relation if a page fails to geocode. So SemanticMediaWikiGeocoder could easily be used for any Semantic MediaWiki with any set of relationships defined. This class is brand new and has only been tested on the Case Wiki, so it might be buggy.
MediaWikiGeocoder relies on BeautifulSoup since it assumes wiki pages can be malformed.
Remember, if you have easy_install, you can simply type this to install CaseClasses:
sudo easy_install http://opensource.case.edu/svn/CaseClasses/python/trunk
After this geocoding toolbox is complete, I'll see if I can make that Case geocoder web service we talked about on the forum.
Update: Per Greg's comment, support for reading coordinates from semantic attributes is now in Case.Geocode (along with some other small improvements).
CaseWikiGeocoder is now defined as:
class CaseWikiGeocoder(SemanticMediaWikiGeocoder):
def __init__(self):
base = super(CaseWikiGeocoder, self)
base.__init__("http://wiki.case.edu/%s",
attributes=['Geographical coordinate'],
relations=['Located in'], prefer_semantic=True)
Comments
Cool!
I just made some updates to Special:GraphStructure. The XML outputted will now contain Semantic MediaWiki attributes and relations. Also, the geo coordinates of articles are also embedded.
It is worth mentioning that Semantic MediaWiki has support for geographical coordinates. At one point in the future, we will probably be deprecating and replacing with geographical coordinate:=. I think support is a little buggy right now, which is why we haven't started.
Anyway, check out the new XML output:
XML for Yost Hall
Very nice work indeed! I am awaiting the use of this feature in a Semantic Wikipedia, and people running around with their Python-enabled mobile phones to find out their current location. It could also be combined with some mapping service -- enter the name of a building to get a map of the surrounding.
I did not know that the geocoord datatype in Semantic MediaWiki is acutally buggy as Gregory says, but I am sure we can easily fix minor problems. The RDF output for those geocoords is still not really cool, since it does not use existing geo vocabularies, but this will be improved soon.
I also added an entry for your project at ontoworld.org which you are invited to update when you have news or a better URL for the tool.