July 14, 2005
Wikis Used for Data Storage
I have been thinking a lot lately about using wikis as a storage backend for data-- not text for an article, but for metadata.
This idea started a few weeks back when Jeremy Smith and I were discussing my proposed maps.case.edu site as part of the upcoming location-based services offerings. I proposed a RESTful interface to XML data so any user or service on campus could obtain details about, say a specific building. You could query maps.case.edu/xml/buildings/WICK and be sent an XML document with details about the building, eg
<building abbr="WICK" name="Wickenden Building">
<departments>
<department code="EBME" name="Biomedical Engineering" />
</departments>
<eateries>
<food name="Bag-It" />
</eateries>
<geo>
<lat>41.5031</lat>
<lon>-81.6084</lat>
</geo>
</building>
Jeremy and I both thought this would be a cool idea, as we love providing easy-to-access and use services to the campus. Before we committed to the project, Jeremy pondered who would maintain the site. Who would create entries for new buildings as they were built? Destroyed? Who would update the data when things changed? We both thought for a second and said, "Isn't this what a wiki is for?"
Since then, I have been thinking of ways to do just this. I have this idea in my head for a MediaWiki extension which allows users to embed XML documents in a page. These XML documents can be extracted at page save time and stored in a separate database table to provide easy access to any external services that might want to access the data. As an added bonus, I could probably work out some XSLT tranforms (perhaps in the form of MediaWiki templates) to convert the XML documents into displayable text in the wiki.
From the high-level perspective, this data repository model makes almost perfect sense. Why limit shared data to be controlled by a select number of people? Find a mistake, just log in and fix it yourself! Your change will instantly propogate to any service that accesses the data.
Speaking of services, access to data would never be easier. After creating an XML document in a page, you could go to say http://wiki.case.edu/misc/xml/buildings to retrieve an XML document of all the
There are problems with my model, however. I believe the main hurdle to be the standardization of the XML documents. Some people won't know what XML is and will be confused when they see it when editing a page. Others won't follow the defined structure. The second problem is data poisoning. Nobody likes getting their web service poisoned. But since we are using a wiki for the data storage, identifying the culprits is all too easy. The former issue can be somewhat addressed by implementation. Things will never be perfect, but it all relies in the implementation.
I believe the concept I proposed has enormous potential, especially to those offering web services to the campus. Why store public data in a limited access database when you can have the community collect and refine the data for you?
Trackback
You can ping this entry by using http://blog.case.edu/gps10/mt-tb.cgi/1874 .
In a previous post I outlined a basic plan for using a wiki to store information in such a way...
Trackbacked from Storing XML in the Case Wiki on Gregory Szorc's blog.Comments
The standardization of the xml documents is not too hard, make a DTD or XML Schema for your XML including everything you think you'll need, and if you need to change it later, that is possible too. The DTD could be in the wiki as well, though that is not necessary, or necessarily a good idea.
As for making people follow it, you've got two options, make an interface that doesn't allow you to input incorrect data (like a form to submit data, that then puts that data into the xml file), or simply validate the XML against the DTD before you save it.
I would not recommend XSLT transforms
Of course, data poisioning is totally unlikely as long as there is no anonymity, which on the case wiki, there is not.
That was supposed to be this link but the formatting got messed up somewhere.
That was also an earlier version of the comment, which I changed a bit, but those changes were lost at some point when the comment failed to post a few times. I was going to not recommend XSLT, due to it's complexity and hard-to-useness, but I didn't see any obvious alternatives, though supposed there are some, like reading the XML with Javascript (not the greatest idea) or perhaps some server-side script that then transforms it into HTML, rather than running a XSLT on it. And anywhere in the post that says "DTD" should say "DTD or XML Schema," since XML Schema might be a better idea.
Check out the Wikidata project by the MediaWiki folks, which is in the design phase right now. It's not explicitly for XML documents, but XML interchange with a system like this is a SMOP.
test
Doctor Who takes three prizes at the National Television Awards in a repeat of its success last year...
Colombia's vice president is "baffled" by Kate Moss's success following cocaine allegations...