Online Calendaring at Case
An online, campus-wide calendar system is a commonly-desired IT system at Case. In fact, demand for the service is so high that, to my knowledge, at least four groups (ITS Middleware Services Engineering, SIS, CaseLife, USG) have considered addressing the problem. I have been involved in a few discussions of the issue over the last year. But, not much progress has been made. (CaseLife does include data from various calendars, but not a very large number of them, and it does not yet provide any sort of machine-readable version of the data it collects.) This is because the problem is a difficult one.
There are two basic approaches to designing this system. In the first, there would be a centralized calendar server which all other calendar services would use for data storage. It would then be possible to query the central calendar server for any set of calendar data.
In this approach, it would be sensible to use a standardized protocol for the central calendar server. The obvious choice of standard here would be CalDAV. iCalendar, CalDAV, and their related standards may not be ideal, but they do have the advantage of fairly widespread industry and open-source support. Unfortunately, CalDAV is in its infancy and to my knowledge there is no existing production-ready CalDAV server.
Even if it were possible to deploy a suitable centralized calendar server, there would then be the problem of getting existing calendars to use it. Some existing calendars may be simple HTML pages on department Web sites. The maintainers of these calendars might not have any interest in switching to a new, more complex system, even if a ready-made "departmental or organizational calendar" application were made available. Other existing calendars may depend on features that are not supported by the centralized server and would not be able to use it while maintaining the same functionality. Still other existing calendars, like the Oracle Calendar, are products of third-party vendors and might never be able to talk to a centralized calendar server.
These two basic problems (lack of existing software, difficulty of getting calendar owners "on board") seem to me to rule out the centralized-server design approach altogether.
In the second basic design approach, there is no centralized calendar server. Instead, the calendar data would be distributed among various systems just as it is now. An "aggregator" would collect data from each of the calendar systems and allow queries over the aggregated data.
There are several obstacles to the construction of such an aggregator. First, the aggregator would need to know where to find its source data. Certainly it could have a basic hard-coded list (the Oracle calendar, the USG and UPB calendars, etc.), but to be truly complete and remain that way, the aggregator would have to crawl the case.edu domain for calendar data in the same way a search engine crawls HTML documents. The crawling is complicated by the fact that calendar data might be in any of various formats, ranging from standard calendar formats (iCalendar, hCalendar, RDF Calendar, etc.), to HTML pages of varying complexity with event listings, to opaque formats like PDF, ad-hoc XML, and Microsoft-proprietary formats. Obviously the aggregator could not possibly support every format, but to be most useful it would need to support as many of these as possible since current calendar owners may not be able or willing to switch formats.
Also, there might be difficulties with authentication. It is possible that some calendar data is only visible to some people. This is definitely true with Oracle Calendar, but it may be the case with other existing systems as well. If so, and if the aggregator is to work with those systems, it must have some sort of special access so that it can collect all the data, as well as the access control list, and then show the data only to people who have the proper credentials.
Then there is the simple fact that much of the data that would be useful to aggregate is difficult to obtain. For example, it would be both interesting and useful for the aggregator to know both the campus class schedule (visible to everyone) and each student's and instructor's schedule (visible only to that person, of course). But, that data is hidden inside an AS/400 somewhere.
There are other challenges as well: Since an event may be published in more than one place, any such calendar aggregator would have to be able to determine whether two events are identical. Of course, the published details might be different in each place so the aggregator would have to have some "fuzzy logic". Another concern is what to do with events that seem to disappear from their source calendars. Have then been cancelled? Rescheduled? Have the event details simply changed?
Despite the difficulty of implementation, I think that an aggregator would be a better solution to the calendar problem than a centralized server, mostly because it requires less cooperation from calendar owners.
Obviously it's not essential to support everything right away. An aggregator that supported a few of the more important calendars would be somewhat useful right away, and support for more calendars could be added later.
I know that other people have thought about this problem a lot. So tell me what I'm missing - is there another overall approach? Are the problems I mentioned not as bad as I think? Worse? Are there others?
Brian said on September 3, 2006 11:47 PM:
I also like the aggregator approach. The crawling part especially gets my gears turning.
Last semester I was thinking about an experimental service that would find events (by crawling the case.edu domain mostly), but also micro-events (like me just saying on my blog "I'm going to Rhode Island this weekend" and having it do the right thing). I'm not sure how interesting this part would even be, but the natural language parsing and event recognition really just sounds like an appealing project to me. (The goal of this project was to be able to answer the question "what are people doing on campus right now?", so it included a bot that would stalk people's away messages on AIM and such, this is where it really starts to stray from calendaring... so that's for another discussion).
Anyway, working on such a crawler still appeals to me. Make a repository! All we need is an experimental crawler, from there we just hack on it to improve its success rate...