In my last post I mentioned that I made my decision to continue with my KDE project, Knoware, for the Summer of Code. For those who missed the full description (read it if you get confused), Knoware is a program that will use Bayesian networks to find patterns between Linux system configurations and bug report subscriptions. In this entry I'll provide a little background as well as some more details, plans, and open issues. Since the project has barely started yet, I'm open to input from any direction.

First, some background. I actually envisioned Knoware as a Windows program in high school more than five years ago. At the time I didn't even know Bayes nets existed, just that computers must have the processing power to analyze the thousands of statistics and arrive at a solution. I pictured the interface much like a chat program, since collaboratively working through problem solutions will play a prominent role in the program's success. I even came up with a business plan for it: make it free for users, but provide custom statistics reports to hardware and software companies at a cost (remember, Knoware will know which products are conflicting with one another, the distribution of models and driver versions, and such). I look back on this now and still see it as plausible, but I'm very glad to be developing for a platform I use and enjoy now. More on other aspects of my original vision, like the interface, shortly.

You may have realized that the mention of Windows brings up an interesting point: conceptually, Knoware has very little to do with KDE or even the Linux platform. It could easily be made to work with Windows or Mac OS, and focus on their respective bugs. If it turns out to work as well as expected, you can be sure versions will pop up on other platforms.

The current limitation to Linux and KDE may actually be a benefit in finding accurate statistics. Sure, simultaneously opening up the program to Windows and Mac users would result in a larger database. The problem is that the bug reports and system configurations would be much less focused. The extra input does no good if it's for a completely different problem and coming from a completely different system. If all input configurations have Linux and KDE in common, this will help the users focus on more of the same problems instead of completely unrelated ones.

The issue has been raised that it may be hard to gain enough users initially, making it difficult to draw useful conclusions. I'm pretty confident when it comes to this issue. Take for example the site KDE-Forum.org. This forum has more than 8,000 registered users, and this is surely a small fraction of KDE users — the English-speaking vocal minority. If just 2.5% of that vocal minority uses Knoware to report a specific problem, that's 200 configurations to analyze. I'm no expert on Bayes nets (yet), but to me that sounds like a fairly significant data set. And in reality, of course, I picture many more users giving Knoware a try.

One aspect that will help draw in participants is the user interface. My mentor for this project told me to concentrate on this area first, since it will ultimately be the first thing to get a user's attention. In my proposal, I scheduled the user interface toward the end of the summer, so already I've learned something new (about KDE development, at least). Earlier I mentioned that in the past I pictured the user interface much like a chat program. This has changed significantly in my mind, since collaborating with other participants is something the user will only be doing some of the time. The two other main tasks will be scanning — that is, gathering information about the user's system — and browsing for relevant bugs.

Scanning the system is an automated process, but requires the user to specify the level of granularity they will provide, since there is a chance that this information will be made available to other users. It's also possible that the user will need to input some information manually, in case the scanner can't determine an important aspect of the user's system.

Browsing for bugs will hopefully be made easy through searching, live filtering, tagging, and other new trends in navigation. Since the detailed bug reports will need to be created at some point, this process is not just read-only. The interface should therefore assist the user in proper naming and categorization of the problem.

Lastly, I'll discuss possible integration with existing projects. As far as I know, there's nothing out there right now (in the public, at least) which hopes to accomplish the same thing as Knoware. However, there are a few projects which can assist in its goal. KDE has an existing bug tracking system using Bugzilla, and a graphical front-end for it called KBugBuster. While I hope to use more meta-data to make bug browsing easier, there's no sense in completely reinventing the wheel, so this is a likely project I will turn to for integration. Additionally, KDE has a crash handler that tries to make crashes a little more friendly. While crashes are only a tiny fraction of the problems I encounter, I won't rule out the possibility of submitting crash reports to Knoware through this program (much like the Windows crash handler).

Hopefully that puts to rest any questions or doubts out there. As I said, input is welcome during any phase of development.