Entries in "Knoware"
Knoware Client Update
It's getting pretty late into the summer now, and I predict that the client portion of Knoware will be done by the end of the week. So, outlook good.
Despite my prior C++ coding projects, my course experience here at Case, and even having taught it for a class, I was dreading getting back into it after having been dazzled by Python for the past year. God damn it, I thought, I'm gonna have to find all these libraries for crap like XML parsing, gonna have to re-learn sockets for like the third time, etc. Of course, no one said I had to use C++ — Python even has bindings for both the Qt and KDE libraries. But, I thought, I owe it to the users to slave over a damn solid C++ program.
The only thing I was certain of was using Qt for the GUI, obviously, just like the rest of KDE. And that's when I started getting more surprised every day, even now. What? Qt does XML parsing? It does sockets? It does threads? It can communicate with external processes? I don't even have to use the classic C++ strings, because it does that, too, and like I'd expect? Qt has definitely made getting back into C++ a pleasure. Thanks, Trolltech.
Remember those old Knoware screenshots? I prettied up a few areas with the help of KHTML, finally bringing them to life. Take a look:
If I don't write a post here about statistical results by this time next week, I'm in trouble.
Knoware, Statistics & Orange
The lack of recent Knoware updates on this blog is a good indication of what the statistical aspects of this project mean for a programmer like myself — lots of time reading and researching up front. Aside from things like database design, I've put very little actual code into Knoware for the past week due the sheer amount of knowledge required to understand (let alone successfully apply) Random Forests and Classification & Regression Trees. This particular approach also includes learning a bit of R.
However, I just came across Orange for the second time and actually read the feature list this time. Wow! Could I have asked for a better module? Their Orange For Beginners guide includes examples of classification, bagging, regression, and other processes that make up the bulk of Random Forests. I think this could definitely speed up the process of getting a working system up before September.
Since I'm talking about the pieces I'm using to construct Knoware, I guess I'll mention that I'm working with SQLite right now for the database backend. I've used it for a few projects so far and I've become quite a fan. MySQL and friends are just too much sometimes — users, permissions, configuration, bah!
Drafting the Knoware Interface
As suggested by my KDE mentor, one of my first tasks is to try designing an attractive interface for Knoware. I've been doing mostly web programming for the past year or two, so getting back to using a GUI toolkit has been fun, and working with Qt Designer is a pleasure.
As mentioned in a previous post, the Knoware client will be used to perform three main tasks:
- Detect and display the user's system configuration,
- Browse and report problems the user is experiencing, and
- Collaborate with other users on possible solutions.
Here's what I've come up with so far (click to enlarge):
An announcements panel, the first thing the user sees when opening Knoware. Much more information will be displayed than is currently shown. It will be styled with CSS, so it should look very nice (think of amaroK's sidebar).
A system configuration panel. Self-explanatory. I may try to make the visualization of the user's system a bit more interesting and easier to browse.
A panel for searching and browsing problems. The search results should be updated as the user types. Problem details, instructions to reproduce the problem, and subscribed users will all be shown here. This is also where the statistics Knoware has collected should be displayed — more on how I plan to visualize this later.
A discussion panel. I do have more things in mind for collaborative problem-solving, but this is the most important and flexible, I think. Discussion 'rooms' may be thought of as one-per-problem (like comments on a Bugzilla page). But an interesting feature is that they are both live (like a chat room) and persistent (like a forum). Right now the two forms of communication people turn to for fixing their problems are forums and IRC — this unites the strong points of both. Since live chat may generate a long discussion history, users can flag individual messages to indicate importance. The flag and recycle-bin icons next to the search field are toggle buttons which show only flagged messages and show junk messages, respectively.
Besides putting all this functionality in one place and including a few niceties, there isn't much new here. The novel part about Knoware is, after all, on the server-side: the use of statistical methods to identify patterns. The discovered patterns should be visible to users, since they are the people who have the ability to use the information and apply it. So how can I best visualize this information?
One way might be to think of any general system configuration as a tree (kinda like in the System panel above). There are always certain components in a configuration, and various areas that differ on each system. To visualize what Knoware has discovered, then, might include making such a tree in which only the likely predictors are shown. The more confident Knoware is about a predictor, the bigger or bolder it could show it, or maybe all the other components are shown but grayed out — the important thing is that the likely predictors of the problem stand out.
Another way might be to show a list much like Apple's Spotlight results. Instead of object categories and files, results would be arranged into component categories and predictors, along with some statistical information.
Some mock-ups would convey these ideas much more easily. Maybe I'll post some later.
Knoware, Part II
In my last entry, Andi Roedl helpfully pointed out LSHW, a hardware-listing tool for Linux. I gave it a try (it's even on Portage), and it works great! From what I can tell, it even retrieves information such as clock speeds, so Knoware may have the ability to identify overclocking as a problem source. I contacted Lyonel Vincent, the author of LSHW, and it looks like this is the best answer to the hardware detection part of Knoware so far.
The other piece of the puzzle is installed package detection. This could end up being even more important than the hardware portion, since issues like library version compatibility seem to cause just as much distress. I haven't searched very hard for something like this yet — any ideas? Chris pointed out that I may have to find per-distribution methods for this to work reliably, which is what I was thinking. Gentoo, for example, uses equery and qpkg.
In the statistics department, my friend Erin and her husband Ryan have been offering their educated brains for the picking. Both suggested that Bayesian networks (as I had in mind) might not provide the best results for Knoware. Instead they pointed me at Random Forests (more info here). Erin explains:
The idea is you can't make any good statistical inference from one decision tree. Random Forests works by splitting the data using majority vote decision tree processes. It does that lots of times, thousands if you want, and it gets more precise the more it does that. Then it tells you which independent variables are the most valuable predictors of your dependent variable. In Knoware's case, you don't know what the problem is or how to fix it, so the problem is the dependent variable. Also, it is a means to evaluate groups, so you can see if a certain kind of problem clumps with other factors.
The source code for Random Forests is available, but it's in Fortran and I'm unsure of licensing issues. Word on the street is there's source in another language out there somewhere. Otherwise I could just do a bit of research and try to code it from scratch. This kind of process seems like a prime candidate for rapid prototyping, so I'm planning to use Python for it. Hopefully it will provide a nice complement to the Python Bayesian Network Toolbox being developed by Elliot Cohen.
Mark Dickie recently commented on my previous entry, saying:
I believe that Dell have something like this although I think it's nothing much more than an update/security announcer tailored to your hardware and software. Might be worth a look.
This raises an interesting point that I have not yet made explicit. I always imagined that utilities like the Microsoft crash reporter (and the Dell utility Mark speaks of, which is news to me) aim to accomplish something exactly like Knoware, but behind closed doors. A big distinction here is that Knoware puts actually fixing problems into the user's hands, and more importantly into the community's hands.
But there is also a simple yet key feature that Mark brings up, and that is providing a distribution channel for patches and other announcements. Notifying users of solutions once they are found is what will eventually cause more users to participate in Knoware. Even if a user has not taken the time to subscribe to a specific problem, Knoware will be able to determine that the user probably does experience the problem based on their system configuration.
Knoware, Part I
In my last post I mentioned that I made my decision to continue with my KDE project, Knoware, for the Summer of Code. For those who missed the full description (read it if you get confused), Knoware is a program that will use Bayesian networks to find patterns between Linux system configurations and bug report subscriptions. In this entry I'll provide a little background as well as some more details, plans, and open issues. Since the project has barely started yet, I'm open to input from any direction.
First, some background. I actually envisioned Knoware as a Windows program in high school more than five years ago. At the time I didn't even know Bayes nets existed, just that computers must have the processing power to analyze the thousands of statistics and arrive at a solution. I pictured the interface much like a chat program, since collaboratively working through problem solutions will play a prominent role in the program's success. I even came up with a business plan for it: make it free for users, but provide custom statistics reports to hardware and software companies at a cost (remember, Knoware will know which products are conflicting with one another, the distribution of models and driver versions, and such). I look back on this now and still see it as plausible, but I'm very glad to be developing for a platform I use and enjoy now. More on other aspects of my original vision, like the interface, shortly.
You may have realized that the mention of Windows brings up an interesting point: conceptually, Knoware has very little to do with KDE or even the Linux platform. It could easily be made to work with Windows or Mac OS, and focus on their respective bugs. If it turns out to work as well as expected, you can be sure versions will pop up on other platforms.
The current limitation to Linux and KDE may actually be a benefit in finding accurate statistics. Sure, simultaneously opening up the program to Windows and Mac users would result in a larger database. The problem is that the bug reports and system configurations would be much less focused. The extra input does no good if it's for a completely different problem and coming from a completely different system. If all input configurations have Linux and KDE in common, this will help the users focus on more of the same problems instead of completely unrelated ones.
The issue has been raised that it may be hard to gain enough users initially, making it difficult to draw useful conclusions. I'm pretty confident when it comes to this issue. Take for example the site KDE-Forum.org. This forum has more than 8,000 registered users, and this is surely a small fraction of KDE users — the English-speaking vocal minority. If just 2.5% of that vocal minority uses Knoware to report a specific problem, that's 200 configurations to analyze. I'm no expert on Bayes nets (yet), but to me that sounds like a fairly significant data set. And in reality, of course, I picture many more users giving Knoware a try.
One aspect that will help draw in participants is the user interface. My mentor for this project told me to concentrate on this area first, since it will ultimately be the first thing to get a user's attention. In my proposal, I scheduled the user interface toward the end of the summer, so already I've learned something new (about KDE development, at least). Earlier I mentioned that in the past I pictured the user interface much like a chat program. This has changed significantly in my mind, since collaborating with other participants is something the user will only be doing some of the time. The two other main tasks will be scanning — that is, gathering information about the user's system — and browsing for relevant bugs.
Scanning the system is an automated process, but requires the user to specify the level of granularity they will provide, since there is a chance that this information will be made available to other users. It's also possible that the user will need to input some information manually, in case the scanner can't determine an important aspect of the user's system.
Browsing for bugs will hopefully be made easy through searching, live filtering, tagging, and other new trends in navigation. Since the detailed bug reports will need to be created at some point, this process is not just read-only. The interface should therefore assist the user in proper naming and categorization of the problem.
Lastly, I'll discuss possible integration with existing projects. As far as I know, there's nothing out there right now (in the public, at least) which hopes to accomplish the same thing as Knoware. However, there are a few projects which can assist in its goal. KDE has an existing bug tracking system using Bugzilla, and a graphical front-end for it called KBugBuster. While I hope to use more meta-data to make bug browsing easier, there's no sense in completely reinventing the wheel, so this is a likely project I will turn to for integration. Additionally, KDE has a crash handler that tries to make crashes a little more friendly. While crashes are only a tiny fraction of the problems I encounter, I won't rule out the possibility of submitting crash reports to Knoware through this program (much like the Windows crash handler).
Hopefully that puts to rest any questions or doubts out there. As I said, input is welcome during any phase of development.
