In my last entry, Andi Roedl helpfully pointed out LSHW, a hardware-listing tool for Linux. I gave it a try (it's even on Portage), and it works great! From what I can tell, it even retrieves information such as clock speeds, so Knoware may have the ability to identify overclocking as a problem source. I contacted Lyonel Vincent, the author of LSHW, and it looks like this is the best answer to the hardware detection part of Knoware so far.

The other piece of the puzzle is installed package detection. This could end up being even more important than the hardware portion, since issues like library version compatibility seem to cause just as much distress. I haven't searched very hard for something like this yet — any ideas? Chris pointed out that I may have to find per-distribution methods for this to work reliably, which is what I was thinking. Gentoo, for example, uses equery and qpkg.

In the statistics department, my friend Erin and her husband Ryan have been offering their educated brains for the picking. Both suggested that Bayesian networks (as I had in mind) might not provide the best results for Knoware. Instead they pointed me at Random Forests (more info here). Erin explains:

The idea is you can't make any good statistical inference from one decision tree. Random Forests works by splitting the data using majority vote decision tree processes. It does that lots of times, thousands if you want, and it gets more precise the more it does that. Then it tells you which independent variables are the most valuable predictors of your dependent variable. In Knoware's case, you don't know what the problem is or how to fix it, so the problem is the dependent variable. Also, it is a means to evaluate groups, so you can see if a certain kind of problem clumps with other factors.

The source code for Random Forests is available, but it's in Fortran and I'm unsure of licensing issues. Word on the street is there's source in another language out there somewhere. Otherwise I could just do a bit of research and try to code it from scratch. This kind of process seems like a prime candidate for rapid prototyping, so I'm planning to use Python for it. Hopefully it will provide a nice complement to the Python Bayesian Network Toolbox being developed by Elliot Cohen.

Mark Dickie recently commented on my previous entry, saying:

I believe that Dell have something like this although I think it's nothing much more than an update/security announcer tailored to your hardware and software. Might be worth a look.

This raises an interesting point that I have not yet made explicit. I always imagined that utilities like the Microsoft crash reporter (and the Dell utility Mark speaks of, which is news to me) aim to accomplish something exactly like Knoware, but behind closed doors. A big distinction here is that Knoware puts actually fixing problems into the user's hands, and more importantly into the community's hands.

But there is also a simple yet key feature that Mark brings up, and that is providing a distribution channel for patches and other announcements. Notifying users of solutions once they are found is what will eventually cause more users to participate in Knoware. Even if a user has not taken the time to subscribe to a specific problem, Knoware will be able to determine that the user probably does experience the problem based on their system configuration.