Co-evolution, step 1
I'm working on the to-do list I put up last week. I now have a rudimentary system for co-evolution of trials, and some experiments are running with it. Results using the most simplistic scheme possible have been terrible, but this is not surprising and I know what to try next.
The way the system works is simply that the trials themselves are scored, by recording the performance of all agents in the current generation on each trial. Obviously the lower this score the 'better' the trial is, because that indicates that agents are not able to do well on it. So when it's time to replace a trial, the trial with the highest score is replaced with a fresh randomly-generated one. Of course, the new trial is not necessarily any better than the old one, so what tends to happen in practice is that a given slot gets cycled through a few times, until the random trial generation comes up with a trial that is more difficult than one of the others in the set. This may be a problem, but for the moment I'm not concerned about it, because intuitively I think the more important thing is to keep the difficult trials so that the agents' population can only improve by getting better at those, not by chance shifts in the batch of trials they face.
Over the weekend I ran experiments in which one trial was replaced per generation, with varying numbers of trials and of agents. Some of the runs have finished now, though most are still ongoing. So far the results look hopeless, but there are two linked provisos. The experiments that have finished so far are the ones with the smallest populations of both agents and trials, and it's rather hard to tell whether or not one of these runs is getting anywhere until it's finished. The reason for that is that if co-evolution works, it will naturally make agents' performance look worse, because they're being measured on on a harder set of trials. So until I've run more of these experiments, I won't know what good performance really looks like without taking the final agent and running a separate set of tests on it. So one thing I need to do today is take the best agent from each of these runs that has finished so far, and test them all on the same batch of 1000 fresh randomly generated trials to get a useful benchmark of their real performance.
My intuition so far is that replacing a trial every generation has made the environment too hostile for the agents to evolve. Fortunately I'm not terribly surprised by this, and do have some ideas for what to do about it - I just wanted to test the most obvious thing first. The next experiments I want to run involve adding a new trial every n generations, or only adding trials when performance is above a threshold. The latter strikes me as the approach most likely to work, not least because I've had success with it on a previous project.
Trackbacks
Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/9201 Looking more closely at individual agentsExcerpt: While I wait for experiments to finish, it's time to take a closer look at those agents I've found so far that seem to be performing well. For many reasons, the performance score during a run is an imperfect gauge of how good the agent actually is, so ...
Weblog: Eldan Goldenberg's lab notebook
Tracked: August 7, 2006 06:25 PM

Comments