search

Last week's update

What follows is edited from the status email I sent my lab last Thursday. It seems like as good a starting point as any for my weekly updates.

I think it would be good for me to go back to writing weekly updates, so I'll start with that now.

When I was last in Cleveland, I took notes of various suggestions my colleagues and advisor had for things I should try with my experiments, and over the holidays I left some things running on the cluster that implemented some of these. Specifically, I set off runs with (a) a wider range of time periods, so that agents would do less well by just evolving for the average of the range, and (b) longer sets of trials, so that agents which died in some trialsets would lose more fitness.

Unfortunately, there were some bugs, so not all of the experiments actually ran (I think I mistyped some paths in the scripts that should have set things off in series), and agents that reached an energy level of 0 did not in fact die. Instead, agents could happily keep going with energy level 0, until one timestep they did something right, at which point their energy level would immediately become positive again and their fitness would increase. Clearly this screws up (b) above, because instead of having a heavier penalty for maladaptive behaviour, the penalty was effectively capped. However, I'm not convinced that it necessarily screws up (a), so while I am re-running all of the experiments with mended code, I'm still going through analysing some of the agents that came out of this run.

[context for next paragraph: fitness is scaled from 0 to 1, and an agent that never bites scores about 0.495, which is typically the best thing in the randomly generated first generation of a search]

The one that I've been looking at so far is not very encouraging. I picked an agent that had evolved with a time period range of 20-50, and scored a fitness of 0.817 when evaluated during evolution. I ran it on a different set of trials generated using the same parameters, and performance was significantly degraded. It scores between 0.62 and 0.71 . More frustratingly, it doesn't seem to be responding to input at all, so much as just applying a complicated, but also rather arbitrary-looking preset pattern.

It's worth mentioning another change I made to the way the experiments run, because it's probably key to what happened here. When I was using relatively small ranges of time periods (e.g. 10-20), I had the agents exposed to every possible period each generation, which at least had the merit of removing all trial noise. However, once the range of time periods increases (and I did some runs with 1-50, though unfortunately none of those worked) this has the drawback of needing a lot of trials per generation. So I went back to using randomly generated trials. I used a trick that seems to have worked before to stop trial noise from being destructive--there's a vector of 20 trialsets, and only 1 set is replaced between generations n and n+1--but apparently this hasn't done the trick here.

So as far as I can tell, agents have been using strange preset patterns of behaviour, and this one scored a moderately high fitness as a one-off because it got a set of trials that was a good fit to its arbitrary behaviour pattern. The frustrating thing is that the run continued to produce agents that scored >0.8 fitness for another 1700 generations after this one emerged, but because I've only been saving agents that are better than any that preceded them, and none of them did better than this one, I can't look at those to see whether the run continued to just take similar shots in the dark, or eventually produced something sensible.

So, where does this leave me? I sometimes lose sight of the fact that the point of this exercise was never only to make something that works, but also to understand which features of the agent and environment are important in making it work. Remembering that goal, I see my plan of action as this:

1 Re-run the experiments that didn't run over Christmas. (currently underway on the cluster)
2 Continue to analyse agents from the Christmas experiments (I'm doing the ones I already have while I wait for the others); there's no reason to assume that they all do the same thing as the one I've looked at so far.
3 Have the simulator save the best agent in _every_ generation, without bothering to check if it's the best ever. (trivial code change that I can do today)
4 Fix that bug that's stopping agents from dying when they are supposed to, thereby distorting fitness evaluations. (I should be able to do this today or tomorrow)
5 Run experiments with the same set of parameters as the last lot, but with the fixed code (hopefully start those tomorrow, so some will have finished by Monday morning)

Then if this later set of experiments also don't work, that's when I have to start thinking about things like using more interneurons, and setting up the trial sets more intelligently with either shaping or co-evolution. Till then, at least I'm learning something from the failures.

Trackbacks

Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/5323

Comments

Post a comment