Entries for July 2005
The Return of CASPER
This post is not related to Knoware, but an old half-finished project called CASPER.
CASPER is a recursive acronym standing for Casper Artist-Sorted Playlist Entropy Reducer. The idea relies on the fact that there are two extremes dominating the way people can sort a playlist: alphabetically or completely random. Since the former would be a strange way to listen to music, most people go with the shuffled approach. In my experience, the reasoning behind listening to a shuffled playlist is to promote variety. But while complete chaos might increase variety, it certainly does not maximize it. Some people notice that their media players seem to have an affinity for certain artists or songs. So I came up with a method to increase the variety of any given attribute in a playlist.
The method places songs throughout the playlist such that items with the same key (for example, songs by the same artist) occur as far apart as possible, and it does this for every song in the list. For example, if the playlist contains 3 songs by X and 3 songs by Y, one result might be X2-Y3-X1-Y2-X3-Y1, where the order of the songs by X and Y are random (so there are lots of possibilities for even such simple cases). With large playlists, this careful song placement is much less apparent, so it appears random. The total entropy of the shuffled sequence is decreased, but the variety is increased.
I've decided to release some simple, documented Python code demonstrating the algorithm used for CASPER. This should make it easy for anyone to implement this shuffling method into their media player if they so desire. If your media player can easily make use of Python, it would probably take less than a minute to do. Check out casper.py if you're curious. Running the script as-is will produce a simple example. Requires Python 2.4 or higher, but is easy to port.
The full source code of the linked file is in my extended entry if you're really lazy, but it's probably formatted poorly and won't have syntax highlighting.
Continue reading "The Return of CASPER"
Drafting the Knoware Interface
As suggested by my KDE mentor, one of my first tasks is to try designing an attractive interface for Knoware. I've been doing mostly web programming for the past year or two, so getting back to using a GUI toolkit has been fun, and working with Qt Designer is a pleasure.
As mentioned in a previous post, the Knoware client will be used to perform three main tasks:
- Detect and display the user's system configuration,
- Browse and report problems the user is experiencing, and
- Collaborate with other users on possible solutions.
Here's what I've come up with so far (click to enlarge):
An announcements panel, the first thing the user sees when opening Knoware. Much more information will be displayed than is currently shown. It will be styled with CSS, so it should look very nice (think of amaroK's sidebar).
A system configuration panel. Self-explanatory. I may try to make the visualization of the user's system a bit more interesting and easier to browse.
A panel for searching and browsing problems. The search results should be updated as the user types. Problem details, instructions to reproduce the problem, and subscribed users will all be shown here. This is also where the statistics Knoware has collected should be displayed — more on how I plan to visualize this later.
A discussion panel. I do have more things in mind for collaborative problem-solving, but this is the most important and flexible, I think. Discussion 'rooms' may be thought of as one-per-problem (like comments on a Bugzilla page). But an interesting feature is that they are both live (like a chat room) and persistent (like a forum). Right now the two forms of communication people turn to for fixing their problems are forums and IRC — this unites the strong points of both. Since live chat may generate a long discussion history, users can flag individual messages to indicate importance. The flag and recycle-bin icons next to the search field are toggle buttons which show only flagged messages and show junk messages, respectively.
Besides putting all this functionality in one place and including a few niceties, there isn't much new here. The novel part about Knoware is, after all, on the server-side: the use of statistical methods to identify patterns. The discovered patterns should be visible to users, since they are the people who have the ability to use the information and apply it. So how can I best visualize this information?
One way might be to think of any general system configuration as a tree (kinda like in the System panel above). There are always certain components in a configuration, and various areas that differ on each system. To visualize what Knoware has discovered, then, might include making such a tree in which only the likely predictors are shown. The more confident Knoware is about a predictor, the bigger or bolder it could show it, or maybe all the other components are shown but grayed out — the important thing is that the likely predictors of the problem stand out.
Another way might be to show a list much like Apple's Spotlight results. Instead of object categories and files, results would be arranged into component categories and predictors, along with some statistical information.
Some mock-ups would convey these ideas much more easily. Maybe I'll post some later.
Knoware, Part II
In my last entry, Andi Roedl helpfully pointed out LSHW, a hardware-listing tool for Linux. I gave it a try (it's even on Portage), and it works great! From what I can tell, it even retrieves information such as clock speeds, so Knoware may have the ability to identify overclocking as a problem source. I contacted Lyonel Vincent, the author of LSHW, and it looks like this is the best answer to the hardware detection part of Knoware so far.
The other piece of the puzzle is installed package detection. This could end up being even more important than the hardware portion, since issues like library version compatibility seem to cause just as much distress. I haven't searched very hard for something like this yet — any ideas? Chris pointed out that I may have to find per-distribution methods for this to work reliably, which is what I was thinking. Gentoo, for example, uses equery and qpkg.
In the statistics department, my friend Erin and her husband Ryan have been offering their educated brains for the picking. Both suggested that Bayesian networks (as I had in mind) might not provide the best results for Knoware. Instead they pointed me at Random Forests (more info here). Erin explains:
The idea is you can't make any good statistical inference from one decision tree. Random Forests works by splitting the data using majority vote decision tree processes. It does that lots of times, thousands if you want, and it gets more precise the more it does that. Then it tells you which independent variables are the most valuable predictors of your dependent variable. In Knoware's case, you don't know what the problem is or how to fix it, so the problem is the dependent variable. Also, it is a means to evaluate groups, so you can see if a certain kind of problem clumps with other factors.
The source code for Random Forests is available, but it's in Fortran and I'm unsure of licensing issues. Word on the street is there's source in another language out there somewhere. Otherwise I could just do a bit of research and try to code it from scratch. This kind of process seems like a prime candidate for rapid prototyping, so I'm planning to use Python for it. Hopefully it will provide a nice complement to the Python Bayesian Network Toolbox being developed by Elliot Cohen.
Mark Dickie recently commented on my previous entry, saying:
I believe that Dell have something like this although I think it's nothing much more than an update/security announcer tailored to your hardware and software. Might be worth a look.
This raises an interesting point that I have not yet made explicit. I always imagined that utilities like the Microsoft crash reporter (and the Dell utility Mark speaks of, which is news to me) aim to accomplish something exactly like Knoware, but behind closed doors. A big distinction here is that Knoware puts actually fixing problems into the user's hands, and more importantly into the community's hands.
But there is also a simple yet key feature that Mark brings up, and that is providing a distribution channel for patches and other announcements. Notifying users of solutions once they are found is what will eventually cause more users to participate in Knoware. Even if a user has not taken the time to subscribe to a specific problem, Knoware will be able to determine that the user probably does experience the problem based on their system configuration.
