search

Ways of scaffolding the task

One of my main interests with the simulations I'm running is in showing how agents can be evolved for learning tasks by using 'shaping' protocols that progressively increase the complexity of the tasks faced by the agents. This is not a revolutionary idea, and in some cases it's very obvious how to go about it (for instance in the time-learning tasks it's just a matter of increasing the range of timings as agents' performance improves), but there can be a bit of an art to getting this right. With the stimulus-learning agents I'm having partial success, in that I now have a couple of high-scoring agents in the initial version of the task but the reliability of searches is low, so it's time to start looking at ways to scaffold this task.

There are really two separate, but linked, needs for scaffolding here. On the one hand, if I can find ways to build up to the current task that should help make searches more reliable. On the other hand, now that I have at least some agents performing well, it's time to start thinking of how to build up from this task to more complex ones. Here are some ideas:

Steps up to the current task

  1. At present, the input to the agent is randomly determined as any value in the range 0 to 1, with the value of food determined according to a formula (which produces the graph at the end of this post): food quality = sin[2π(input + offset)]. This means that there are regions of input-space in which food has a strongly negative or positive value, but also regions in which food has a value close to zero. I suspect that initially avoiding giving the agent input in the close-to-zero regions would make the task easier, because the agent would only be receiving inputs that have a strong reinforcement value, while the in-between values present at best ambiguous reinforcement. As performance increases, the agent could gradually be exposed to more complete sets of inputs.
  2. In a similar vein, I could try multiplying the effect of all food by a constant factor. This would not completely remove the 'crossover' values (after all, with any offset and multiplier there will still be 2 input values for which that equation returns precisely 0), but it would reduce the proportion of input space for which the reinforcement is weak enough to be ambiguous. I like this less than the previous idea because it seems cruder, and increases the risk that a small number of early mistakes would kill off an agent.
  3. I could easily add a constant term to the value of food, shifting the curve up and thereby making a larger proportion of possible inputs signal good food. As performance increases, this constant could diminish to zero. This would make the task easier, but might also run the risk of making it too easy for agents to perform well without learning, because the cost of biting indiscriminately has been lowered. I think this is worth exploring though, especially because this one, unlike the other ideas I've had, lends itself well to continuing to make the task more difficult.
  4. I could fix the generation of trials such that an exemplar of the highest-value food is presented early on. Obviously guaranteeing it to appear in the very first trial gives agents an easy exploit that would probably stop them from developing any other way to learn, but guaranteeing its appearance in the first n trials woudl at least leave individual trials less likely to kill off agents by just happening to have a long period before the presentation of any non-toxic food.

Ways to increase task complexity further

  1. Following on from #3 above, subtracting a constant term would make the task harder, and it may be interesting to gradually increase the size of the penalty, in a rough equivalent to environments becoming more hostile over time.
  2. I could either increase the multiplier term in the equation (currently 2π) or use a different function, so that the input:food goodness relationships changes from having only one peak and one trough to more than one. Increasing the multiplier appeals to me because it's easy to do this gradually and see how far I can take it before losing the agents. It would also be interesting to see what happens once there is more than one peak: would agents pick up on additional peaks, or would they simply latch on to whichever one they happen to discover first?

Footnote

I realise that I keep referring to a formula without any illustration of its output, and I always find it easier to visualise what manipulations to a formula will do if I have some kind of illustration. So here is a graph of food value (y axis) against input signal (x axis) for 5 sample offset values (the different series):

shifted sine waves

Hopefully with that it's a little easier to see what I'm talking about when I describe additive or multiplicative offsets to the food goodness - input relationship.

Trackbacks

Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/8934 Looking more closely at individual agents
Excerpt: While I wait for experiments to finish, it's time to take a closer look at those agents I've found so far that seem to be performing well. For many reasons, the performance score during a run is an imperfect gauge of how good the agent actually is, so ...
Weblog: Eldan Goldenberg's lab notebook
Tracked: August 7, 2006 06:26 PM

Comments

Post a comment