search

Barry Brumitt - Connecting Models with the Physical World

Although I tend to work from home anyway now that I have no classes to go to, it does still make a difference that I'm now a long way from my lab and academic department. For some things—like the weekly lab meetings—there is no substitute, but other things can be got around. While physically at Case there were various seminars each semester with invited speakers (and also a course requirement that I go to 5 of the department's seminars each semester), and they were a resource I found useful. Fortunately Seattle has several universities and private research institutions that hold open seminars, so it's not exactly difficult to make up for this. If anything, it's going to be easier now for me to cherry-pick talks that are more relevant to my own work, because instead of a requirement to go to 5 seminars from a specific series, I can go to any of interest.

I'm going to start writing up brief reports on talks I go to, partly to aid my own memory, partly for the edification of anyone else who might be interested, and partly because I am supposed to show that I've gone to enough here that it's a reasonable substitute for the ones I'm missing in Cleveland.

Today's was Barry Brumitt of Microsoft Research, talking at Intel's lab about Connecting Models with the Physical World. The talk was anchored with two concrete examples from his own work, and behind the cut I'll write about both.

EasyLiving

The first part of the talk was about a now-shelved MSR project called EasyLiving. The first thing that struck me about EasyLiving was quite how much it resembles an MIT Media Labs project—Project Oxygen—that I remember hearing about some years ago. Both are large-scale efforts at making peoples' computer use less centred on a beige box that sits on the floor in their offices; or as Brumitt put it on disaggregated computing, which means clustering the user interface of a computer or computer network around the user rather than around the machine. Of the two, EasyLiving is the smaller in scope, but it still combined quite a few different technologies towards this over-arching goal.

The components of the project are ideas like mapping the physical locations of everything in a building (so that a user could ask the computer where the nearest printer is, for example), tracking the whereabouts of users, working to connect arbitrary devices to each other (so that a user could choose to have the output from their computer displayed on a monitor in the room they happen to be in right now, for instance), and getting away from the keyboard and mouse as the only input devices. There is a lot of knowledge representation and inference technology behind this, but personally I found the user interface issues the most interesting.

What they tried to do with EasyLiving was rather like an implementation of the computer on the Starship Enterprise, in that people interacted with their test system by speaking to it and pointing at things. Quite an important part of this was to build a vision system that allowed the computer to follow pointing gestures so that a user could do something like say "Show it to me on that screen". The group also found that the vast majority of test users looked directly at whichever object they were referring to (66% always did, and 91% usually did), which suggests another potential approach because computer eye tracking is somewhat more established than gesture following.

This reminds me of the importance of shared attention in human-human interactions, and it set me thinking about how we currently show shared attention with a computer interface. It's pretty clunky really—just a change of colour on the title bar of a window—and for me at least it's probably the most common cause of UI errors. I often find myself typing into the wrong window, and I know I'm not the only one because the IRC channel I use often gets Unix commands accidentally typed into it by other users. I think the problem is only partly computers not being clear enough about which window is in focus, and there's also an issue that the user isn't able to communicate which window was meant to be the target. The equivalents in human-human communication are usually (though certainly not always) understood with a minimum of explanation, and I wonder if the human-computer interaction might be easier if computers had a camera to watch the user's eyes and infer which thing they are looking at.

Of course, it's worth remembering that not everyone reliably looked at what they were referring to in the study, and that brings me to something else I was impressed with in the presentation. He clearly understood that there isn't One True Interface which is right for all users all the time. Instead he talked about testing different modalities, and offering different interfaces to the same system, in the hope that users would choose the appropriate one for the task they were trying to accomplish. I think this is really important because I can't imagine a single interface that would be simple enough for a beginner to use and give expert users enough access to the workings of their machine, so multiple interfaces seem to be the only way that a single system can suit all.

Forza Motorsport

For the other half of the talk, Brumitt switched focus from a distant-future research project to a product that is already out on the market: Forza Motorsport. It's a racing game (a cynic might suggest that it's Microsoft's version of Gran Turismo, but I wouldn't say anything like that, would I?), and he was very proud to declare that it is one of very few games on the market with a real-time learning system. It's more or less commonplace for games to be designed with various AI techniques, but what's distinctive about this one (and Black and White apparently, but I don't remember the other examples he gave) is that the computer opponents actually learn an individual's playing style, and players can train drivatars which are computer players intended to drive like the player does.

There's a detail that I found interesting here, and it's one that seems to underpin the making of human-imitative AI players: the game is built around a very detailed physics model. Apparently it is typical for racing games (at this point I should note that I have very little experience playing anything more recent or realistic than Super Mario Kart) to allow the computer to control cars more directly than a human player can. In other words, the computer can pretty much place the car where it wants, while a human has to deal with an imperfect control interface to a car that has inertia, limited traction, and so on. In this game, apparently, the computer must play through a similar interface to the human players. Obviously there isn't some miniature servo inside an Xbox that moves a mini-joystick around, but the point is that the computer can only issue commands via the simulated control system of the car, which then get interpreted through the same physics engine as a human player's car. Apparently this makes the AI players a lot more realistic and interesting, and allows drivatars to resemble their human trainers.

Within this framework, there were some other details I found interesting. The control space for a real car has important discontinuities—mainly when the tyres lose traction—which is a refrain familiar from everything I've read or done with dynamical systems. The computer controllers that they have so far can't deal with these, so they avoid things like drift driving and skidding round corners, which a good human driver can deal with; this too is much like other AI work in that it's much easier to design a controller for a restricted domain than for everything in an environment.

And finally I was impressed with how they adjust the skill level of AI players. Instead of artificially putting mistakes in, the computer players' use of the physics model allows them a subtler alternative: the computer players are made less good by mis-tuning their estimation of how their car performs. If a player over- or under- estimates the traction their car has, it will lead to a consistent set of driving mistakes that a skilled opponent can learn to exploit.

Trackbacks

Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/2201

Comments

Post a comment