search

Entries in "results"

August 07, 2006

Looking more closely at individual agents

While I wait for experiments to finish, it's time to take a closer look at those agents I've found so far that seem to be performing well. For many reasons, the performance score during a run is an imperfect gauge of how good the agent actually is, so I like to score them all by presenting an identical set of 1000 fresh trials, and then look closely at their performance on those trials for which they do badly as a guide to what strategy the agents are actually implementing.

I'll be going into a lot of detail on small things, so the report of results is behind the cut.

Continue reading "Looking more closely at individual agents"

Co-evolution, step 1

I'm working on the to-do list I put up last week. I now have a rudimentary system for co-evolution of trials, and some experiments are running with it. Results using the most simplistic scheme possible have been terrible, but this is not surprising and I know what to try next.

Continue reading "Co-evolution, step 1"

July 14, 2006

Slow, slow experiments

Since before GECCO, I've been running experiments based on both strands of my roadmap. I now have rather a lot of computing power at my disposal, which makes it all the more damning that some of these experiments require more than a week to run. The problem is that evaluation noise has the potential to swamp any meaningful differences between agents, and if there's one certainty about genetic algorithms it's that they'll exploit whatever stupid trick their designer unintentionally leaves available, instead of doing what they were intended to.

Continue reading "Slow, slow experiments"

March 06, 2006

A positive result!

I was only away for 8 days, but jetlag took its toll, which is why it's taken me till now to write about it. Anyway, I had a lot of data waiting for me on my return, and last Wednesday I started analysing it. I'm excited because I've actually found a positive result, and behind the cut I'll copy the email I sent my lab about it.

Continue reading "A positive result!"

February 10, 2006

More on drift

Looking at more best-agent vs last-agent-in-search comparisons today has made me realise something: the one pattern that seems to be emerging is that the last agent tends to be much more consistent in its behaviour than the agent from 1000 generations earlier. They achieve this by being less affected by sensory input, though looking at their genotypes is not enlightening as regards exactly how, so it's nothing as straightforward as the setting of gains to 0 that I saw with the catcher agent.

On a given batch of trials, this often (but not consistently) causes a marginal improvement in the agent's performance, yet in the search this never registered as an increase in fitness. This seems to be because the 'best' agent in a search is actually the agent that gets luckiest in terms of the exact batch of trials it encounters, so on the particular batch that's in use in its generation it actually benefits from the noise. But then, when I compare it to other agents using a standardised batch of trials, it's very unlikely that this batch will happen to favour it like the one it encountered during evolution. Of course, the fitness improvements I'm seeing are all very small (typically in the third significant figure), because if they were bigger then they would swamp the randomness effect, and we'd have a new 'best agent'.

Anyway, the upshot of this is that I'm not sure what's happening really counts as random drift, because there seems to have still been some fitness improvement.

February 08, 2006

More interneurons don't help either

I tried running the same set of experiments, but giving agents 10 interneurons instead of 3. To be honest, I wasn't expecting this to help, because I suspect that the stupid threshold strategy is a local optimum that would need some change in the environment or trial structure to get around, but I had to try this anyway. If the agents with more interneurons performed significantly better or came up with qualitatively different strategies, it would tell me something useful about the difficulty of the task, and be strong evidence against my hypothesis that this version of the task doesn't support the evolution of learning because it's too easy to perform reasonably well with a non-learning strategy.

As it turns out, the average performance of 10-interneuron agents was marginally better, but again not statistically significantly so. I did the same kinds of comparisons, with fitnesses normalised in the same way, as the data I reported yesterday. On average, the best agent from a 10-interneuron run outperformed that from the equivalent 3-interneuron by 0.0398, and the final agents were better by 0.03323. However, the standard deviations were 0.08486 and 0.08784 respectively, and a t test tells me that the probability of error is 0.25 . Qualitatively, the agents are all using the same basic strategy, so my interpretation of these data is that the larger number of interneurons simply supports slightly better tuning of the thresholds, and even that claim is too weakly supported by the data to be publishable without caveats.

Continue reading "More interneurons don't help either"

February 07, 2006

I was wrong about trial selection

After a few hiccups due to my own errors and some hardware trouble, I now have all the results I was waiting for from the temporal-correlation experiments with 3 interneurons. They're not very impressive, but I have learned some things from them.

There's only one really surprising result, which is that my preliminary report that the 'comprehensive trials' experiments were doing better than ones with randomly generated trials was in fact wrong. It is true that the fitness scores reported during the search are much closer to the results from a more thorough re-testing if the search used the comprehensive trial generation system, but that's the only advantage. More detail, and some hand-waving about why, behind the cut.

Continue reading "I was wrong about trial selection"