search

Entries for July 2006

Ways of scaffolding the task

One of my main interests with the simulations I'm running is in showing how agents can be evolved for learning tasks by using 'shaping' protocols that progressively increase the complexity of the tasks faced by the agents. This is not a revolutionary idea, and in some cases it's very obvious how to go about it (for instance in the time-learning tasks it's just a matter of increasing the range of timings as agents' performance improves), but there can be a bit of an art to getting this right. With the stimulus-learning agents I'm having partial success, in that I now have a couple of high-scoring agents in the initial version of the task but the reliability of searches is low, so it's time to start looking at ways to scaffold this task.

Continue reading "Ways of scaffolding the task"

How to deal with evaluation noise

As I've mentioned before, I'm running into problems of evaluation noise with my runs, and the most obvious solution (give each agent more trials) is too costly to be practical. I do have a few ideas of what to do about this, which I'll be experimenting with over the next few weeks.

First, it's worth stating exactly what I mean by evaluation noise, and distinguishing between different kinds. Any evolutionary algorithm has to select between individuals in some way, in order to decide which will be reproduced. If the function for which the algorithm is being used is sufficiently trivial—matching a target string, for instance—then it's simple enough to score each individual precisely on its closeness to the goal. In practice, however, this doesn't work for any interesting problem because comprehensively evaluating each individual would simply take too long. Instead, it's necessary to find some way of approximating the real quality of each individual. In my experiments, I do this by presenting the agent with a sample of possible trials (generated randomly, but each agent in a given generation gets the same set). For many real world applications there's an additional problem—agents are tested on a simulation of the application they'll ultimately be used for—but I don't need to worry about that because my agents only exist in simulation.

Evaluation noise is the disconnect between the scores individuals are given during a GA run, and their real overall performance. I can think of at least three types:

  1. Systematic bias. This can be from bugs in the simulator (experience suggests that if there is an exploitable bug that incorrectly assigns high fitness, a GA will find the exploit rather than the desired behaviour) or biases in the way the samples of trials are generated. I've encountered both of these problems in the past, but right now there's nothing leading me to believe I'm in this situation with the current experiments.
  2. Random noise. Even if there is no systematic bias in the generation of trial sets, any given snapshot is unlikely to sample the whole space of possible trials evenly (and making it do so is a great way to create systematic bias). This opens up the possibility of selection being effectively random, because the individuals that happen to perform well on one particular set of trials are rubbish at all others, and the evaluation trials change fast.
  3. Overfitting. If the evaluation trials don't change fast enough to cause the random noise problem, agents can instead overspecialise, by evolving to exploit some regularity of the evaluation set that does not correspond to a regularity of the set of all possible trials.
Simple parameter tweaks will push me between issues 2 and 3: generate a whole new set of trials each generation and I get random noise, but if I keep too many trials the same from generation to generation I get overfitting. Below the cut I'll suggest some ideas for dealing with this issue.

Continue reading "How to deal with evaluation noise"

Continuous-sensory experiments

As I set out almost a month ago, I'm now running experiments in which the agents receive sensory input on a continuous scale, and must respond appropriately to that in order to perform well. So far I've found one good agent among a general background of unsuccessful runs, but I'm still tinkering with parameters to try and make the process more reliable. Detailed method and a brief overview of results so far are behind the cut.

Continue reading "Continuous-sensory experiments"

How to cool a cluster?

I have a small study with a large number of computers in it (including several up on the wall). Between them, all these machines give off an awful lot of heat. Fortunately I live somewhere with mild summers, so this is rarely a problem, but on the hottest few days it can be a worry. The biggest problem is those machines up on the wall, because if there's not enough of a breeze through the building they can end up in a pocket of air 20° or more hotter than the temperature where I sit.

So far I've been dealing with this by having a fan on the windowsill that points up, forcing the air to mix more, but it's annoyingly noisy and today that fan seems to have died. I've generally found fans designed for cooling people to be quite unreliable when left on continuously, but I can't think of anything better, because the way these machines are mounted is not well suited to installing case fans. Does anyone have any ideas for a better way of cooling those machines than just buying another household fan each time the existing one gives up the ghost?

Slow, slow experiments

Since before GECCO, I've been running experiments based on both strands of my roadmap. I now have rather a lot of computing power at my disposal, which makes it all the more damning that some of these experiments require more than a week to run. The problem is that evaluation noise has the potential to swamp any meaningful differences between agents, and if there's one certainty about genetic algorithms it's that they'll exploit whatever stupid trick their designer unintentionally leaves available, instead of doing what they were intended to.

Continue reading "Slow, slow experiments"

GECCO braindump

I've spent most of the past week at GECCO - the Genetic and Evolutionary Computing Conference. It was, as expected, not as relevant to me as ALife, but at the same time it was gratifying to see how much quality content there was in the Real World Applications and Evolutionary Computation in Practice tracks. I should definitely make a point of going to the GECCO closest in time to my graduation, as it would be a very good way to find out about the sorts of jobs worth me applying to. Till then, I think I should prioritise the conferences with a more scientific and less engineering emphasis.

As a personal preference, I also prefer single-tracked conferences. The advantage of having multiple parallel tracks (GECCO had as many as 9 sessions going on at any given time) is that it allows greater breadth of content, but the drawback is that splinters the conference. I find that I get as much value out of a conference in terms of being able to talk to many like-minded people about their and my work as I do from the presentations themselves, and it's much easier to make conversation when everyone's been listening to the same thing, and is there because of a comparatively well-defined theme.

The customary braindump follows after the cut.

Continue reading "GECCO braindump"