This morning I commited a working version of Merquery to the Django Subversion repository.

svn co

My code in particular lives in branches/search-api/django/contrib/search.

The Lucene adapter is fully functional (but needs some more convenience functions), and the Xapian and Hyper Estraier adapters need a little more work.

Here's an example of using the Lucene adapter, using a similar model to my old Merquery post:

from django.db import models
from import LuceneIndexer

class Person(models.Model): first_name = models.CharField(maxlength=30) last_name = models.CharField(maxlength=30) biography = models.TextField()

indexer = LuceneIndexer('/tmp/person-index', Person, fields=['Person.biography'], attributes={'first': 'Person.first_name', 'last': 'Person.last_name'})

As you can see, you specify an index location (database locations should be supported in the future), which model should be considered the document (hit results), and which fields to index.

It also allows shorthand for the fields:

indexer = LuceneIndexer('/tmp/person-index', Person, 'Person.biography',
                        first='Person.first_name', last='Person.last_name')

Okay, let's insert some people...

b = Person(first_name='Brian', last_name='Beck', biography='Python advocate')
g = Person(first_name='Guido', last_name='van Rossum', biography='Python creator')
s = Person(first_name='Spiros', last_name='Eliopoulos', biography='Loves Haskell')

And force all the Person objects to be indexed...


You can also send update a list of Person objects to update (beware of Lucene's update inserting duplicates for now).

Finally, Lucene's sweet query syntax is available for your database:

>>> for hit in'python'):
...     print hit, hit.instance

<LuceneHit: merquery.person 1, Score: 0.625> <Person: Brian Beck> <LuceneHit: merquery.person 2, Score: 0.625> <Person: Guido van Rossum>

>>> for hit in'python creator'): ... print hit, hit.instance

<LuceneHit: merquery.person 2, Score: 1.0> <Person: Guido van Rossum> <LuceneHit: merquery.person 1, Score: 0.168048456311> <Person: Brian Beck>

>>> for hit in'last:Beck OR first:Spiros'): ... print hit, hit.instance

<LuceneHit: merquery.person 1, Score: 0.496906995773> <Person: Brian Beck> <LuceneHit: merquery.person 3, Score: 0.496906995773> <Person: Spiros Eliopoulos>

There are some things that should be changed before Merquery is production-ready, and some things that would be nice in the long-term.

One is that the indexer knows how to follow ForeignKey fields (maybe 'Person.address.street_name', for example), but there needs to be a couple changes to get them to work—this is a result of a Field instance seemingly not having an attribute linking back to the Model it is bound to. Support for ManyToMany joins also needs to be thought out.

Another is the way you pass Model instances to specify as the document return type (the type returned by hit.instance). Documents should be able to aggregate many Model instances and not have to consider any particular Model the document type. So a Document class will probably be introduced that acts like the Model metaclass, letting you 'build' a Document prototype and telling the indexer how all of its attributes should be retrieved and treated.

Automatically knowing when to update the index would also be very nice.

There are a few main long-term goals...

One is to make a universal Merquery query language that will automatically translate queries into the backend's query syntax. This would especially good for Hyper Estraier, which has whacked-out attribute-search syntax:

@first STRINC Brian

Another is to make some Models to keep track of indexing status and query statistics, and offer nice admin views of these.

Finally, storing all index data in a database (especially a Model-compatible one) instead of on the filesystem would be great.

Since I have a week before classes start, I'm going to continue making the necessary changes to make Merquery production-ready.