Incremental progress
I'm moving towards making the weekly updates a somewhat less detailed overview, overlapping with the more detailed ramblings I write more frequently but on no regular schedule. This week's is behind the cut.
1 Analysing more agents.
I'm finding the same dumb threshold strategy shows up over and over again. There are some slight variations on it; the best so far goes something like this:
1 Bite for the first 4-6 trials
2 When energy drops below ~0.965, bite once
3a If the food was good, keep biting until 4
3b If the food was not good, wait till energy drops below ~0.953, and bite again, this time continuing to bite until 4
4 Stop biting when either energy has been maximal for 3 consecutive trials, OR energy is ~0.96 or higher and the last bite was of bad food.
This particular one was evolved on periods from 1-10. It's the only one I've yet found that does well in that range. It actually performs pretty close to optimally for 2 For a given agent, making death happen as I had originally intended in the simulation reduces fitness. This is because it makes no difference to those trialsets on which the agent would have performed well, but slightly cushions the impact of screwing up on those trialsets in which the agent fails. However, agents evolved on the correct simulator tend to have a slightly higher fitness than those evolved with the broken version. Unfortunately, they don't seem any more likely to do anything clever; it just seems to be supporting somewhat better tuning of the thresholds in the dumb strategy. [more detail in Friday's post]2 The effect of fixing agent death.
3 Comprehensive trial generation.
I was troubled by two issues:
- a tendency for the fitness score from an evaluation with fresh randomly-generated trials to be much lower than that from the fitness evaluation during evolution.
- evaluation noise in general.
So I decided to implement a different way of generating the trials. Instead of creating a purely random batch of trials, I now have the simulator generate a batch in which there are n trials at each time period allowed by the parameters. There's still some randomness—in the decoy sensor input—so it still replaces one trial set each generation, but now it replaces a trial set with period X with another one at the same period. That way, every agent sees every time period in its evaluations.
The results from this have been another incremental improvement without anything radically different happening. It's from these runs that I found the agent I described in detail above, which I think is the best refinement I've seen of a reactive threshold strategy, but nothing more impressive than that. In general, agents evolved from these runs do somewhat better on the standardised tests I'm putting them through, but it's nothing radical.
I'm also seeing the best ever agent emerge a lot later in the run—quite often in the last 1000 generations and sometimes in the last 100—so I'm wondering if I should run searches for more than the 5000 generations they currently go for.
4 Next experiments to run.
Some of the "comprehensive trials" experiments are still running on the cluster, because in some cases it's resulted in many more trial sets being presented to each agent (I was using 20 for all conditions, but some of these experiments have a period range of 1-50).
Next I want to run some with more interneurons, because I'm starting to feel like I've explored this setup enough to be able to say that 3-interneuron agents are very unlikely to do anything more interesting than I have seen. I think I'm going to take a subset of the parameter ranges I've been exploring, and run them with 4-10 interneurons.
5 Programming work to do.
I haven't yet put together any alternative versions of the task, as I said I would last week, because there still seems to be quite a lot to do with the temporal correlation variant. I still think I should move on while the next set of experiments are running, in case this really is a dead end. Here's what I think I need to get done this week, in the order I'm planning on doing it:
- Fix a niggling bug in the 'comprehensive' trial generation scheme I described above. There's some kind of cyclic issue that is causing one of the trial sets to get replaced wrongly every n generations, and I haven't yet figured out why. When this is done, I should re-run some of the searches to see if it mattered; I don't think it does, but I need to confirm that.
- Set up the simple shaping system that I mentioned in part 1. I'm not sure how important this is, but it should be easy to get this running and do another batch of experiments using it; then while those are running I can take on the more involved task.
- Set up an adaptive stopping criterion that will declare a search to have finished if n generations have passed without any improvement in fitness, so I can do some longer runs without wasting hours of CPU time on a search that's going nowhere.
- Implement the version of the simulator that uses sensory flags in various ways.
Trackbacks
Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/5413 I was wrong about trial selectionExcerpt: After a few hiccups due to my own errors and some hardware trouble, I now have all the results I was waiting for from the temporal-correlation experiments with 3 interneurons. They're not very impressive, but I have learned some things from them. There...
Weblog: Eldan Goldenberg's lab notebook
Tracked: February 7, 2006 11:33 PM

Comments
Hey, thanks for posting these -- it's been neat to read about what you're doing.
Thanks! It's nice to know that someone is reading, now that comment spam has made server logs meaningless.