Jeremy Smith's blog

Entry Is Labelled

Choosing Platforms

In response to Scoble's listing the top 12 reasons entrepreneurs don't build off of the Windows platform (and, you can substitute any large market-share IT "platform" company in there — Oracle, Sun, IBM, etc.), I agree a lot with what Sam Ruby has to say in It Just Works. But, I really have to agree with what Simon Willison has to say in Taking charge of your own destiny.

[T]he key reason that open source development tools are so compelling: they put you in charge of your own destiny.

As developers, our lives revolve around abstractions. The key activity when programming is designing and using abstractions - we weave them together to create new ones, with our target being an abstraction that lets our users achieve their goals.

The problem with abstractions is that they leak. The taller the stack of abstractions (browsers on templates on Django on mod_python on Apache on Linux on hardware on a network) the more components there are that can go wrong. When they do go wrong, we need to work out why and find a solution.

If your development stack includes closed source components, there's a limit on how far you can dig. Run in to a bug in the underlying platform (or an undocumented feature or just something you don't understand) and you are left at the mercy of your vendor. You only control the top part of your system.

It's true. It's the only "good" way I know how to troubleshoot a truly hard problem with a system or to architect a truly flexible solution. I need the source code... all of it. Sympa. The Blog. The Wiki. The Identity Management framework. Shibboleth. ITS-Services. Our internal wiki. The source code is available. If there is a problem, I can nail it right down to the single line of code that caused it; and get this, I can change that line of code. That's great!

I've worked with systems (a lot!) where I didn't have the source. It consists of a lot of guesswork and relying on past experience. For example, our mail system – when we have a cataclysmic crash, we know to run a 4 hour reconstruct over the entire mail repository before bringing the service back online. We know that doing that prevents the mail system from an inevitable crash in the near future. So, even though it sucks to have to keep the mail system offline for another 4 hours while our users chomp at the bit to have the service restored, we take the net win and do the reconstruct up front. Why do we actually have to do the reconstruct? I couldn't tell you. I've never been able to look at the source code to the mail server. I know that we've been bitten twice (once for over 20 hours of downtime) by not doing the reconstruct, but that's as much as I can tell you. It's lab-rat intellect &mdash if we touch that lever, we get shocked; and if we touch that lever, we get food. It doesn't take too much analytical thought to arrive at that, but I couldn't tell you why I am in the experiment in the first place. I can't actually explain or explore the deeper, more specific bug/problem/occurrence. Using the closed source system, I am forced down to using the intellect of a lab rat.

But, I am glossing over a point here. Not everyone is a Sam Ruby or a Simon Willison. Not everyone in the IT industry has the ability to crack into some source code, read it, react to it, troubleshoot it, and ultimately fix it. (Trust me.) Some people in IT are better equipped to have a standard out-of-the wrapping solution, have it set up by consultants as a closed box, and then maintain its humming along day-in and day-out. If something goes wrong with it, you tweak a configuration file here and there and hope for the best. If there's a catastrophic failure, you pray your support contract is still paid in full; and you pass the buck to the vendor to repair it (and, when your manager asks, you pass the blame on to the vendor, too).

Don't get me wrong. Each solution has pro's and con's. I don't mean to come off so didactic and so "it's-a-black-and-white" thing. There's a lot of grey. And, there are just as many times that an open source solution works best as a off-the-shelf would. The "managerial" or "Information Architect" skills come in to play by being able to identify between the two for the sake of your users/customers in the big picture (the "big picture" referring to aspects like ease of administration of the system improves the Sys. Admin's ability to take care of the system which leads to less downtime which ultimately improves things for your customers/users).

For me personally, though, I like to have the source. I seek to never have to utter the statement, "well, I opened up a support ticket with the vendor to see if we can get the problem fixed – we'll have to wait until they get back to us." I don't like being at their mercy.

But.. I am drawing on polarity. I painted stereotypical representations that are never true. It's shades of grey in reality — matters of degrees. While I am a control freak and I change my own oil in my car and I would never own a car that I couldn't change the oil, I have never forged my own pistons. I don't refine my own gasonline (or oil, for that matter). I don't tend to a herd of cattle to supply me with the leather to use in my interior. It's a matter of degrees, and I am not trying to come across as painting a simple picture. It's a complex issue when deciding on architectures.

I just know that, for me, I like to be able to point to a line of code and say, "that's it!"

P.S. Though, I would jump at the chance to forge my own pistons. That would be a fun and instructive activity.


  1. gravatar

    I agree.

    A similar phenomenon to the 'lab rat' thing is happening to me as I transition from being a network admin (at home and at my former workplace) to being just a user (here at Case). As an admin who can control all the variables, or at least with read-only access to all of the relevant configurations, I can use my powers of deductive reasoning to figure out what's wrong and how to fix a problem. Here at Case, though, I can't just (say) look at the CUPS configuration for the Wade printers to figure out what I need to do to my machine to make printing work. Instead, I have to either poke at things until they work, or ask the help desk like a muggle.

    As you say, proprietary software restricts one's ability to examine code and configuration to find out what's wrong. With OSS, even if I can't fix the problem for some reason, I at least attempt to find the offending line of code or whatever, see what's causing the failure, and figure out a workaround.

    OSS tends to have more transparent internals, too.
    I can almost always debug a network connectivity issue on a Linux box far faster than I can solve a similar issue on a Windows machine.

  2. gravatar
    Here at Case, though, I can't just (say) look at the CUPS configuration for the Wade printers to figure out what I need to do to my machine to make printing work. Instead, I have to either poke at things until they work, or ask the help desk like a muggle.

    As a side note related to the above quote, there are a lot of intricacies to the Case network and services that go undocumented. We are hoping that the Wiki can enable our users to help us fill in some of those gaps for all of us. For example,