Establishing a fitness baseline
Having found that normalising agents' performance against the highest fitness achievable under the same set of conditions made me realise something else. For every individual trialset there is also a maximum possible fitness which depends on parameters, and which will feed back into the performance of an agent on a given trialset. So I decided to plot this. The pattern was more complex than I had anticipated, though in retrospect the causes of the pattern are straightforward.
The graph below is for all the conditions I've used so far. There are two independent variables:
n is the number of trials in a set. I've been running experiments at n=250, 500 and 1000, since deciding that n=100 is too short for agents to do anything interesting. I'm now contemplating dropping n=1000 from future experiments, because these runs take a long time to finish, and I'm not sure they're showing me anything the others don't.
t is the period of oscillation between good food and bad, so if t=7 then 7 presentations of good food will be followed by 7 presentations of bad, and the cycle will repeat for a total of n presentations. I have run searches with the following ranges [min, max] of t values: [1,10], [1,20], [1,50], [10,20], [10,50], [20,50] and [40,50]. Then to analyse an individual agent qualitatively I look at its performance and behaviour at particular t values, including those outside the range it was evolved with, so I need to establish a baseline for all values that mean anything.
I can account for all of the features on this graph. I don't think the explanation is in itself very interesting, but it helps me interpret other data, so I'll spell it out here for reference. This will also force me to note down some quirks of the simulation.
First of all, the 3 plots scale to each other horizontally. That's to say that if the horizontal axis were changed to t/n, all 3 plots would have their peaks and troughs in line with each other. This is because the simulation scales the energy rewards and penalties relative to n, so the maximal amount of energy the agent can ever have varies (I'll come to why later). In effect, the 3 values of n are sampling the same space at different rates, and the vertical displacement between n values is a result of this, because the cost of 1 trial without good food is dependent on n.
Because of the above, I'll try to describe everything else in relative terms, so that statements are true of all 3 plots.
The easiest feature to explain is the r-shaped upward curve in the right hand half of the plot. At t=n/2, a trialset consists of precisely n/2 presentations of good food, followed by n/2 presentations of bad food. The perfect agent keeps its energy maximal by biting for n/2 consecutive trials, and then doing nothing for the rest of the set. Therefore the best fitness attainable at this point is the outcome of n/2 trials at maximal fitness averaged with n/2 trials during which the agent's energy is slowly decreasing because there is an energy cost to existing. For t>n/2, the initial block of good-food trials becomes longer than half of the set, which makes the block of bad-food trials necessarily shorter than half of the set, causing the best attainable fitness to increase as t increases beyond n/2.
For t=n, the best attainable fitness is exactly the same as the best possible fitness in any given trial, because every trial in the set is a presentation of good food. This is not equal to 1 because my simulator applies the energy increase for eating food before applying the costs for exising and for biting. It is not invariant with respect to n because these costs are scaled by n.
The final feature of this upward curve that needs accounting for is between t=0.5n and t=0.474n . For these trialsets, the proportion of trials that contain good food is increasing as t decreases, because after t presentations of good food and t presentations of bad food, there is time for n-2t more presentations of good food. However, attainable fitness is still decreasing with decreasing t for a subtle reason. At the beginning of this second epoch of good food, the agent has very low energy, and although its energy starts to increase with each presentation of good food, it takes several consecutive presentations of good food for the agent to reach maximal fitness. The range 0.474n<t<0.5n is the range in which the fitness increase from the second epoch of good food presentations is outweighed by the fitness cost of shortening the first epoch of good food.
t=0.474n is the equilibrium point, at which the agent's mean energy level during that second epoch of good food is equal to its mean energy through all the trials before that. This is why the maximum attainable fitness starts to increase as t decreases below 0.474n .
t=n/3 is the next important fixed point, as this represents 2 epochs of n/3 consecutive presentations of good food, with 1 epoch of n/3 presentations of bad food sandwiched in between. However, the next peak in attainable fitness is at a slightly lower value of t, for a reason complementary to the explanation for the deepest trough being at t<n/2 . At slightly smaller values of t, there is a second epoch of bad food, but it is too brief to undo the positive contribution made to the mean by starting the epoch at maximal energy. t=0.293n is the equilibrium point this time. The precise numbers an artefact of the particular values I chose for the energy increments and decrements in the simulation; they are only important for drawing relative comparisons.
The remaining peaks and troughs at t<0.293n are effectively harmonics of the fundamental pattern described above. I knew all that time messing around with rock bands was going to teach me something useful.
The reason the troughs are successively deeper is that the agent starts with energy level 1, and its energy level is never allowed to be >1. This has the side-effect of compressing the benefit of longer good-food epochs, because the agent's energy can only increase so far. Meanwhile, under these conditions the agent's energy never has to reach its lower bound (which is 0), so the cost of longer bad-food epochs is not compressed in the same way.
Trackbacks
Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/5523 Complex signatures from simple behaviourExcerpt: I've spent about 3 hours today analysing one agent. This is longer than it usually takes, especially given that the agent's behaviour turned out to be quite simple. The reason it took me so long was that the agent's simple, 4-step control algorithm int...
Weblog: Eldan Goldenberg's lab notebook
Tracked: January 26, 2006 06:06 PM How to normalise fitness scores
Excerpt: I've been thinking some more about how to normalise fitness scores, so that the number on its own is most meaningful. The issue really is one of making a comparison between agents tested under different conditions as much as possible into a comparison ...
Weblog: Eldan Goldenberg's lab notebook
Tracked: January 30, 2006 05:46 PM I was wrong about trial selection
Excerpt: After a few hiccups due to my own errors and some hardware trouble, I now have all the results I was waiting for from the temporal-correlation experiments with 3 interneurons. They're not very impressive, but I have learned some things from them. There...
Weblog: Eldan Goldenberg's lab notebook
Tracked: February 7, 2006 11:33 PM


Comments