Entries for August 2006

JavaScript Contour Mapping: Still Seems Feasible

A few posts ago I talked about generic contour plotting for Google Maps using only JavaScript.

After some more work and research into contouring algorithms, it still seems like a reasonable idea.

Instead of gridding the irregularly spaced points (crime locations in my case) I learned that many contouring algorithms instead use Delaunay triangulation. So I implemented incremental triangulation in JavaScript and the result seems fast enough (note: still needs Firefox or Adobe SVG). On my (older) computer it gets unreasonably slow around a few hundred points, but this is more than I expect to render at a time anyway (consider that the Crimes in 2006 page only has about 60 unique locations).

The next step is to modify the algorithm to use Constrained Delaunay or Ruppert's Refined Delaunay.

After that, it's just a matter of interpolating the crime density (so that levels in between high and low densities are created if they don't exist), smoothing the polygons, and drawing & coloring them with SVG.

I checked out a nice collection of research papers from KSL called Geometric Modelling which includes some contouring discussion. I plan to make another visit to the library this week to see if I can find anything helpful about the steps above. Googling for this stuff has been extremely helpful, but no single page has very comprehensive information, I just have to piece it together.

On the Case Forum Greg mentioned his idea for a Case geocoding service that would aggregate information from geocoding services such as Google Maps and the Case Wiki so that any Case web service could easily geocode local places (Andrew mentioned geocoding photos, for example). I think the crime log location parser fits this bill pretty nicely! The version in trunk has tons of Case-specific location parsing techniques that could save other web services a lot of work. It just needs to be put on a server that will accept strings and send back geocoded results.

More crime & mapping news to come...

Crime Does Pay: Free Stuff From the Awesome Guys at Reddit

Yesterday I received an envelope from Somerville, Massachusetts.

You may remember that a couple weeks ago Sara, Spiros and I were fortunate enough to swipe a souvenir from the reddit office building. Alexis, one of the reddit founders, somehow caught on and (no doubt willing to do anything to rescue their helpless sticker) made a generous offer.

Well, I took him up on it, and I'll let this picture do the talking:

reddit stuff

These first-class guys even shipped it first class, along with a note suggesting we use the stickers responsibly! About half of the items above have already been distributed to numerous reddit fans around campus.

You may be wondering: "How can I get free stuff from reddit?" Turns out all you have to do is fly to Boston, use this picture to track down their building, and let your kleptomania lead you from there! (And while you're there, check out the burrito place across the street.)

You may also be wondering: are we going to honor the rules of hostage negotation and send back their sticker? Well, that all depends if we can get The Segelmeister to part with it.

Thanks Alexis, Steve, Chris and Aaron!

Merquery Summer of Code Results

This morning I commited a working version of Merquery to the Django Subversion repository.

svn co http://code.djangoproject.com/svn/django/branches/search-api/

My code in particular lives in branches/search-api/django/contrib/search.

The Lucene adapter is fully functional (but needs some more convenience functions), and the Xapian and Hyper Estraier adapters need a little more work.

Here's an example of using the Lucene adapter, using a similar model to my old Merquery post:

from django.db import models
from django.contrib.search.backends import LuceneIndexer

class Person(models.Model): first_name = models.CharField(maxlength=30) last_name = models.CharField(maxlength=30) biography = models.TextField()

indexer = LuceneIndexer('/tmp/person-index', Person, fields=['Person.biography'], attributes={'first': 'Person.first_name', 'last': 'Person.last_name'})

As you can see, you specify an index location (database locations should be supported in the future), which model should be considered the document (hit results), and which fields to index.

It also allows shorthand for the fields:

indexer = LuceneIndexer('/tmp/person-index', Person, 'Person.biography',
                        first='Person.first_name', last='Person.last_name')

Okay, let's insert some people...

b = Person(first_name='Brian', last_name='Beck', biography='Python advocate')
g = Person(first_name='Guido', last_name='van Rossum', biography='Python creator')
s = Person(first_name='Spiros', last_name='Eliopoulos', biography='Loves Haskell')

And force all the Person objects to be indexed...

indexer.update()

You can also send update a list of Person objects to update (beware of Lucene's update inserting duplicates for now).

Finally, Lucene's sweet query syntax is available for your database:

>>> for hit in indexer.search('python'):
...     print hit, hit.instance

<LuceneHit: merquery.person 1, Score: 0.625> <Person: Brian Beck> <LuceneHit: merquery.person 2, Score: 0.625> <Person: Guido van Rossum>

>>> for hit in indexer.search('python creator'): ... print hit, hit.instance

<LuceneHit: merquery.person 2, Score: 1.0> <Person: Guido van Rossum> <LuceneHit: merquery.person 1, Score: 0.168048456311> <Person: Brian Beck>

>>> for hit in indexer.search('last:Beck OR first:Spiros'): ... print hit, hit.instance

<LuceneHit: merquery.person 1, Score: 0.496906995773> <Person: Brian Beck> <LuceneHit: merquery.person 3, Score: 0.496906995773> <Person: Spiros Eliopoulos>

There are some things that should be changed before Merquery is production-ready, and some things that would be nice in the long-term.

One is that the indexer knows how to follow ForeignKey fields (maybe 'Person.address.street_name', for example), but there needs to be a couple changes to get them to work—this is a result of a Field instance seemingly not having an attribute linking back to the Model it is bound to. Support for ManyToMany joins also needs to be thought out.

Another is the way you pass Model instances to specify as the document return type (the type returned by hit.instance). Documents should be able to aggregate many Model instances and not have to consider any particular Model the document type. So a Document class will probably be introduced that acts like the Model metaclass, letting you 'build' a Document prototype and telling the indexer how all of its attributes should be retrieved and treated.

Automatically knowing when to update the index would also be very nice.

There are a few main long-term goals...

One is to make a universal Merquery query language that will automatically translate queries into the backend's query syntax. This would especially good for Hyper Estraier, which has whacked-out attribute-search syntax:

@first STRINC Brian

Another is to make some Models to keep track of indexing status and query statistics, and offer nice admin views of these.

Finally, storing all index data in a database (especially a Model-compatible one) instead of on the filesystem would be great.

Since I have a week before classes start, I'm going to continue making the necessary changes to make Merquery production-ready.

Functional Programming in JavaScript: Nice But Slow?

While making those two JavaScript contour prototypes I posted yesterday, I had the chance to try out the nice functional programming tools offered by MochiKit. I love MochiKit—mostly because it's highly influenced by Python.

However, since the performance of the contour mapping code is extremely important to its feasibility, I did some profiling and was a bit surprised.

Consider this code to test membership in an array of points:

var containsPoint = function (container, element) {
    for (var i = 0; i < container.length; i++) {
        if (pointsEqual(container[i], element)) {
            return true;
        }
    }
    return false;
}

Boring, right? Now the functional equivalent:

return some(container, partial(pointsEqual, element));

some and partial come from MochiKit.Base and MochiKit.Iter, respectively.

Sadly, I profiled the functional approach at being 27 times slower than the verbose version.

Similarly, this code:

var neighbors = new Array();
for (var i = 0; i < points.length; i++) {
    var p = points[i];
    if (is_neighbor(scale, prev, p)) {
        neighbors.push(p);
    }
}

...is 3 times faster than this code:

var neighbors = filter(partial(is_neighbor, scale, prev), points);

I assume this has something to do with the functional approach being run as "pure JavaScript" while the verbose version is optimized by the implementation—like pure-Python vs. Python written to take advantage of its C implementation (the "pure" version being slower).

I think a lot of people don't profile their JavaScript code, so I'm just throwing this out there. Unless someone can point out a solution to the dramatic speed difference, I will sadly be avoiding a lot of the functional tools offered by MochiKit.

I'll be spending the next few days moving into my Village apartment.

Tomorrow: The final day of Summer of Code, I'll talk about my results.

Generic Contour Plotting for Google Maps

The other day Mike, Spiros, and I began thinking about generating SVG contour maps with JavaScript. The reason is that I want to be able to show a contour map of crime density (like this one of property crime) on the Campus Crime Mapper.

I think a way to generate generic contour maps for Google Maps in JavaScript would be extremely popular with Google Maps developers. It could be used to map temperatures, population density, precipitation, wifi coverage, and lots of other things people are using Google Maps for.

There are only a couple sites out there that have overlayed contour maps on their Google map. The best one seems to be EarthTools, which uses it to display a topographic contour map of the area. I can't tell if it uses static image tiles, SVG, or Canvas; maybe someone else can figure it out.

You're probably wondering why I want to do it all in JavaScript and SVG as opposed to rendering static tiles on the server and overlaying them. There are a few reasons:

  • Using JavaScript will make it available to any Google Maps developer, regardless of their server-side tools.
  • The contour shapes will only need to be calculated once for all zoom levels, since they are infinitely scalable.
  • I am almost positive that the time it takes for the client to calculate and render the SVG contours will be about the same amount of time it would take to download all the custom tiles anyway (especially if they will be zooming and scrolling — hint: they will be).
  • Theoretically it will allow the user to change the granularity, colorization, and other aspects of the map.

I'm not looking to generate anything insanely detailed, just something at about this detail level.

I have some proof-of-concept code online that tests finding shape contours and smoothing them with Bezier curves. You will need either Firefox 1.5 or a browser with Adove SVG enabled to view those demos (I'm considering also supporting Canvas).

Note: Those pages (and the Crime Mapper) will be down while I move into the Village.

Speaking of SVG and Canvas, I'm going to use this sweet JavaScript plotting library called plotkit to show crime statistics — check out the demo on that page!

I'm excited to see how far I can take this, but right now I've got two days left in the Summer of Code.

Back in Cleveland (with a Piece of Reddit)

Today I arrived back at home base, Cleveland with the Segelmeister after riding two trains, two buses, and a plane.

We stayed at my house (in Newport) and did all sorts of non-Cleveland things...

  • Moved our bodies all over Boston with the amazing and enjoyable public transit system (saying loudly so Cleveland RTA can hear).
  • Found the office building of the reddit / notabug guys, decided to hold their door sticker hostage:
    reddit_sticker_1.jpg reddit_sticker_2.jpg
  • Swam in the North Atlantic.
  • Spent some time relaxing (reading & coding) in the back yard, where Sara almost met her demise at the jaws of a vicious skunk.
  • Talked with Spiros (ECMAScript Working Group and Adobe employee) about deploying the Crime Mapper at Brown (he's on the case—Providence geocoders, get in touch!).
  • Met a security consultant from Ashland, Ohio who chatted about hackers and cyberpunk authors with us when he saw that Sara was reading Snow Crash.

In five days I will be done with the Summer of Code and be able to move into the Village...

I'll end with something positive about Cleveland. While in Cambridge, Sara and I were walking around with empty stomachs and very little money. After circling Harvard Square a couple times, we couldn't find a single fast food place. Not believing that it could be too upscale for a Wendy's or McDonald's, we asked a T driver where we could find such a place...

Me: Hi, is there a fast food place nearby?
Driver: Well, what are you in the mood for?
Me: Oh, just anything super cheap and unhealthy.
Driver: Now, why would you want that?
Me: Well, we don't have a lot of money.
Driver: How much do you want to spend?
Me: Like, five dollars each.
Driver: There's a Greek place around the corner, tell them Z sent you!
Me: Okay, thanks.

We took his advice and went to the Greek place, where I reluctantly paid about $1.50 more than I had hoped for a meal... from the kid's menu. Yes, even the T drivers in Cambridge are too upscale for us.

Saturday Mapping Thrills

Today I decided that I needed to be punished and proceeded to create and geocode 58 parking lots on the Case Wiki.

This will slightly help the campus crime mapper since some reports are given lot numbers as their location. You probably noticed that the crime mapper got a bit fancier. I made icons for a few categories of violations, and took care of multiple incidents being located in one spot. It's also running with FastCGI now so maybe it's faster?

One pattern I've already started to notice is that some places are hotspots for a certain type of violation. Check out the Crimes in 2006 page and you can tell that it's a bad idea to park your bike near Fribley, and to park your car on Carlton Road. (Also on that page, zoom way out for a surprise!)

Well, I'm all geocoded out after hunting down parking lots for the past couple hours. Suggestions for the crime mapper are welcome.

P.S. The code is now on opensource.case.edu and there's a to-do list on the wiki. Feel free to browse around or contribute!

Update: Hey all you geocoders and wiki mappers, check out this awesome resource I just found. It's a build identification glossary with location names, aliases, and addresses. This is exactly what the crime mapper needs...

Mapping Crimes on Campus

So a couple entries ago I mentioned that someone should make something like chicagocrime.org for Case. This is possible since we have a daily crime log which lists the location of each incident, and an excellent wiki with many geocoded locations.

I was bored a few nights ago and decided to try it out. In the interest of releasing early and often, I've made my first test available online at exogen.case.edu/crime/recent.

As you can see, it's nothing too fancy yet. This is my first non-trivial Django application, and I plan on using it as a testbed for Merquery.

So here's my to-do list (when I'm not working on Merquery, of course):

  • Draw new markers.
  • What should happen when two markers are placed on the same location, which is likely? Show both in the info window? Change the color of the marker? Try to put them next to each other?
  • Offer views by type of violation.
  • Activate the better location parsing (right now only the wiki data is checked, but the Google geocoding API helps with addresses). Already written, just needs to be put into production...
  • Geocode all the locations that couldn't be found.
  • Offer an RSS feed of recent crimes.
  • Import all incidents back to 2000 (for fun).
  • Use all the imported data for crime statistics per area.
  • Put code on opensource.case.edu.

Now to watch for that bicycle thief...


Update: Problem seems to be fixed, let me know if it's broken for you.

Update: Put the new location parser into production, and it now has a 66% success rate for geocoding locations. More wiki entries would help that number... anything to avoid doing them manually.