Bloomington debrief
Last week I went to Bloomington to catch up with my lab. It was a shorter visit than would have been ideal, because of external constraints, but we got a lot done in two long, busy days. In terms of my own work, we mostly talked about ways I can make the task easier, because at the moment it takes too many hours of CPU time to get one good agent.
There's a bit of a tension here between the results I need in the end and the results I need to take the next steps with my PhD. Because I'm explicitly asking about the conditions under which learning does and does not evolve, all of the negative results are actually relevant in the long run, but for now I need to chase after positives. This is so that I can defend a research proposal—the sooner the better—which will be impossible without some positive results. If I can demonstrate some conditions under which good agents reliably evolve, then it's not too hard to argue that I'll be able to find many manipulations that cause them to fail, but the reverse is a lot trickier. So I'm left looking for ways of making the task easier.
Specific ideas I got from colleagues and want to take note of are behind the cut.
Testing the single-threshold agent
We have a more than slight suspicion that the best agents I've seen so far are all implementing a single-threshold strategy. What I mean by this is that instead of selecting a range of input signals to respond to by eating or avoiding, they are selecting a threshold, above or below which they will always eat. Depending on the offset used in a given trial (see chart), this is a strategy that will work perfectly (for 0.4, for instance, because the whole input range can be partitioned into one 'good' and one 'bad' zone) or quite poorly (I think 0.7 would be the worst (take the 0.2 line and turn it upside down) because any partition would necessarily include half of the bad food). Obviously for an individual agent it's easy enough to watch behaviour and determine if this is going on, but what I also need to do is hand-code an agent that does exactly this and measure its performance. With that information, I can tell how much better an agent that really does what I want it to will perform than one which implements the single threshold; if this difference is too small then I may be in trouble because there could be too little pressure in a search to come up with anything better than the single threshold.
Of course a simple way to solve this problem would be to use a function for the sensory flag - food goodness mapping that always has more than one peak and more than one trough, but my intuition is that this will itself make the task harder, so I don't want to do it unless I'm sure I have a problem with single-threshold agents.
Rest periods
At present, a trial set consists of successive presentations of food to the agent with no breaks in between. At the end of a presentation of food, the output neuron is 'read' to decide whether or not the agent will bite, and the agent's energy level is changed as appropriate. The problem with this is that at the beginning of the next presentation of food the agent receives two bits of information simultaneously: both the sensory flag associated with the new food, and the change in its energy level caused by the old food. I hadn't really thought about this problem before others in the lab pointed it out, but in response I should try putting a gap in between each presentation of food, and see if that changes performance.
Allowing partial eating
At present the agent either bites the food or ignores it; a binary decision. The trouble with this is that in order to learn about food quality it must start out by biting everything to get some feedback, but if it happens to get a few presentations of really bad food at the start of an evaluation, that will cost it a substantial fitness penalty, adding unnecessary evaluation noise. So I'm going to try making the bite-not bite distinction a continuous one, allowing agents to partially eat some food (by using the actual value of the output neuron instead of just whether or not it is above a threshold). This should allow safer test-biting, which might bring about a desirable reduction in evaluation noise.
Starting agents' energy levels in the middle of their range
At present, an agent starts a trial with an energy level of 1, and its energy level saturates at 1. This means that if it is immediately presented with good food and eats it, it gets very little information. To remedy this, I will try starting the agents' energy levels at 0.5. This still leaves open the possibility of saturation later on—in fact the best agents will be able to reach an energy level of 1 fairly quickly—but I'm less worried about that because it will kick in after the agent has learned the sensory flag - food goodness relationship.

Comments