This week
Here is the update I wrote on Wednesday. Last week's update may be useful for context:
I have quite a lot to say this week, and for some of it I haven't yet sorted out the wood from the trees, so I'm going to try and break it down into useful sections, and put a summary up here. You may only want to read the summary, but the detail's there if you want it:
- I finished analysing the agent I described last week, and several similar ones. They are not very good, but I learned some useful things from them; mainly that my searches are often getting stuck in a local optimum, and that there is a source of noise I hadn't accounted for in the environment.
- A better agent: though I still haven't found one that does the phase locking I was hoping for, I've found one that behaves in a more useful reactive way, so it robustly does well. This is a bit of a mixed blessing, because the existence of a reactive strategy that works well probably makes it a lot less likely that a learning strategy will emerge.
- Programming done/to do: old bug fixed, a new one found & fixed, trying another way to generate the trials.
- Data to analyse: the experiments in which the agent does actually die when fitness reaches 0 are finished
- New experiments to run: ones in which non-optimal agents are saved correctly, so I can see whether anything interesting happens between the best agent being found and the end of the run.
- Next steps: maybe time to close off the temporal-correlation avenue and start looking at runs with useful sensory input?
1 Analysing last week's agent
The agent that I was analysing when I last checked in turned out to be quite disappointing. Here is an approximate procedural program that describes its behaviour:
1) Start by not biting
2) Once energy level drops below [0.836 - 0.848], start to bite
3) Continue to bite until energy level goes above [0.956 - 0.984] (or until the end of the trialset, because this threshold may never be reached)
4) Stop biting, and goto 2
I'm detailing it for two reasons: I found several agents that did this, and although it's not inherently a good result it led me to realise a few things that might be useful.
This is a not-bad approximation strategy for the range of environments it was evolved with (in which the time period of good food - bad food oscillation ranged from 20 to 50), because most good food periods are long enough for it to get back to the upper threshold if it starts biting at the lower threshold, and unless it starts biting early in a bad food period it will survive that and manage to recover. However, it can at times be very damaging, because if it happens to start biting at the start of a bad period, or too close to the end of a good period, the cost of biting bad food will set it on a downward spiral in which it loses more energy in the bad period than it can make up in a subsequent good period, and fairly quickly gets down to energy level 0.
The fuzziness (thresholds being a range of values rather than a single point) seems to come from the agent's sensory neurons, which I'm feeding noise in the hope that the agents will ignore inputs. The issue is that none of the agents I've looked at have set the gains to or weights from their sensory neurons to _precisely_ 0 (probably because that's quite difficult to do), so in effect the noise that I'm feeding those input neurons is acting as a noise injection to the whole circuit. I'm not 100% sure that this is the cause, because I haven't investigated it enough, but it's the only source of noise I'm aware of. This isn't necessarily a problem, but it makes analysing the agents' behaviour slightly more difficult (so far it's not been a big enough effect to obfuscate any patterns), and it does have one important implication for the trial generation: two trialsets can have the same oscillation period and still not be identical. That _is_ important, because it means that simply feeding agents one trialset at each oscillation period length is not the same as comprehensively testing them on every possible trial.
I looked at a couple of other agents from runs with the same range of oscillation periods but different numbers of trials in a set. They had similar strategies, but it looks like longer trialsets push the "start biting" threshold slightly higher, and vice versa. To me, this strongly indicates that this stupid strategy is a sort of local optimum that is easier for the searches to find than anything clever.
2 The better agent
Since my last update, the experiments that had failed to run over the holidays have all finished. Some of these found fitter agents than those I had been analysing last week; most notably there was one search with the period ranging from 1-50 which came up with a best fitness of 0.894, so I took a look at that.
I benchmarked it on a set of 1000 randomly generated trials, and the best fitness was still 0.89, so I can be confident that it wasn't just a fluke of the evaluations it was presented with that generation.
This one's strategy is a little more complicated, and considerably more noisy than the others I've looked at, but here's a rough description:
1) Don't bite
2) Start biting at trial 2
3) Stop biting at trial 8-12 [thus far, this must be a transient]
4) Start biting again once energy drops below ~0.93
5.1) If biting decreases energy level, stop biting for a few trials, and then bite again
5.2) If biting increases energy level, then keep biting, until either biting no longer increases energy level, or energy level is high enough [which now varies more widely than I have an explanation for]
It's basically a variant on the previous strategy, but 5.1 protects it from poisoning itself to death when the food is bad. Ironically, this does better (0.914) when periods are restricted to 20-50 than the agents evolved for that condition. Either I just got lucky with this one search, or the wider period range is stopping it from falling into the local optimum I was having trouble with. I'll run a few more of these with different random seeds to find out.
3 Programming done/to do
I fixed the bug I described last week that was stopping agents from dying when they should. I've run the same set of experiments as before, with this fixed code, but not yet analysed the results.
I set the simulator to save the best agent in every generation, but noticed today that I had made a mistake, causing it to actually save the best ever agent, in a new copy each generation. I've also realised that the sheer volume of data this creates is ridiculous. So today I fixed that so it really would save the best agent in the current generation, and so that it will only save every n generations (but still save every time there's a new best fitness ever). I think for present purposes n=100 should be a good compromise between giving me as many snapshots as I want to use without flooding either the cluster's or my own hard drive with files I'll never look at.
Next I want to set it up such that every possible oscillation period is presented to every agent, but there is still random replacement of trials so that the input noise varies. I think that may make searches less likely to specialise on a subset of the period range.
4 Data waiting to be analysed
I've run a set of experiments with the fixed code in which agents can actually die. I think that this should disfavour the stupid strategy I described in part 1, because the penalty for the trialsets in which that strategy is suicidal are now higher. I'll start looking at that data in the morning.
5 New experiments to run
I'm still interested in agents from the generations after the best one emerges, to see if drift produces anything interesting. I'm guessing it won't, but I can't rule this out without looking at individual agents, because there is some evaluation noise.
Then there are the runs with every possible period presented to every agent, which might potentially have less local optimum trouble.
6 Next steps
I am beginning to feel like the temporal correlation approach is a dead end, mainly because of part 2. If it's possible for an agent to do well enough by simply reacting to the food it just ate, then it may be hopelessly unlikely that any given search will find a more sophisticated solution than that. Before I can say that for sure, there are some more things to try, however:
a) tinkering with trial parameters. Perhaps the cost of eating bad food needs to be higher, so that there's a greater penalty to the agents that only discover the food is bad by eating it.
b) using more interneurons. Perhaps these agents aren't doing anything more complex because their architecture (3 ints) doesn't support it?
Having said that, I think I do need to start looking at other things, to avoid the trap of merely becoming an expert in what doesn't work. I think on Monday I'm going to start working on the version of the simulator in which the agents' input neurons carry information that actually does correlate with the goodness or badness of food. I had set out a roadmap for this a few months ago; now I think I'm ready to implement it.
I don't think it's yet time to abandon the temporal-correlation experiments, but I feel like doing nothing else would be a risky move.
Trackbacks
Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/5324 I was wrong about trial selectionExcerpt: After a few hiccups due to my own errors and some hardware trouble, I now have all the results I was waiting for from the temporal-correlation experiments with 3 interneurons. They're not very impressive, but I have learned some things from them. There...
Weblog: Eldan Goldenberg's lab notebook
Tracked: February 7, 2006 11:34 PM

Comments