Next steps
Prior to ALife I was drifting a little, having spent too long figuring out what experiments I should be doing. The conference helped with motivation, and talking to various people—particularly Eduardo Izquierdo-Torres (whose research focus is very close to mine, and is a visitor to my lab for the summer)—helped me re-focus. I've decided that what I need to do is run more experiments for two of my existing setups, and set up two modifications that make the conditions more continuous.
Overall, I'm looking at two specific questions. The first is whether I can evolve generalist time-sensitive agents, or (as I suspect) will keep getting stuck with specialists that are tuned to some subset of the possible timescales they could be exposed to. The second (and more interesting, I think) question is under what conditions the ability to learn and re-learn a stimulus-response association can evolve. I'll go into a bit more detail below:
Time-learning agents
I'm putting this one first because it's of quite limited scope, and I already have the simulator I need. At the start of the year I was working on evolving agents in an environment in which food switched between nourishing and toxic at regular intervals. The agents received no information about food goodness until eating it, so the only way they could perform above chance was to do some sort of synchronisation to the timing of the change between good and bad food. I was hoping to evolve agents that could use the first couple of changes to set some kind of internal timer and thereby cope with a wide range of intervals, as long as the interval stayed constant within one test run.
I didn't get what I was looking for, but I did find some interesting things. Agents evolved with only one interval became hardwired to that interval; unsurprisingly, because CTRNNs lend themselves well to the generation of fixed-period oscillations, and in such an environment there's no benefit to doing anything more sophisticated. Agents evolved on a not-too-wide range of intervals often developed a somewhat more flexible but still reactive strategy, which copes well with some variation in interval, but is defeated by intervals more than slightly outside the range the agents were evolved with. I came up against hard limits on how broad a range of intervals I could evolve agents for (roughly speaking, the maximum had to be less than double the minimum).
I moved on from these experiments without fully exploring the parameter space. I had good reasons to: I expect that the best I'll be able to do is define the maximum time period range a little better. However, if that really is all I can do, it's still useful for me to be able to authoritatively say so, so I think it's still worth me exploring this further. I need to try increasing the number of neurons the agents can use, because the most plausible scenario I can think of that would have led to a failure to evolve learning agents even though the environment can support it is if the experiments I've run so far just haven't had enough interneurons to implement a variable timer.
I can set off some of these experiments immediately, but once I get the experiments I'll describe below working, these will move to the back burner, just running them at times when I have computers free.
Stimulus-learning agents
The experiments that I think are more interesting, largely because I've already had more success with them than I expect from the time-learning ones, involve giving agents a sensory signal with which to determine food goodness, but varying what the signal will be from one run to the next. This way, an agent has to be able to learn which stimulus should trigger eating in each given run, in order to perform well.
I've been running experiments along these lines, but the things I've done so far suffer from involving too discrete a space of sensory inputs. The trouble with this is that for any given input space, it's possible for agents to perform well by instantiating a finite state machine (as Phattanasri, Beer and Chiel found), but this will be inherently incapable of scaling to an arbitrarily large input space. I was initially under-concerned by this problem, because in a sense the real-world task of associating odours with edible or inedible things appears discontinuous: smells are composed of a number of discrete chemicals, and a given organism has to respond to a finite set of combinations of them. However, that finite set is a subset of a continuous n-dimensional space, where n is the number of individual chemicals the nose can detect. It is true that the smells of individual items are specific regions in this space, and it may even be true that they don't overlap at all, but I am now convinced that the continuity is still crucial for both the evolution and the development of odour discrimination.
So what I need to do is run similar experiments with a continuous space of possible sensory inputs. It's a simple enough change in itself—I just need to start generating signals as n random real numbers instead of n randomly assigned bits of binary—but it will necessitate some other design changes. I think that if I make the input space continuous, I'll also need to do something with the reward space, because if the agent gets maximal punishment for biting when the input it receives is very close to the correct one but not quite the same, it will probably never be able to learn the right discrimination. I have a feeling the agents will also need more neurons, because a given interneuron can only be sensitive around a fairly small range input values, but I can tinker with that by running experiments and seeing what succeeds. None of this requires a great deal of programming work, so I should be running the first of these experiments over the weekend.
If I can get these to work with an arbitrary set of inputs, then the next step is to make them able to learn new discriminations when [probably easier] additional food types appear during a run, or [harder] the signals arbitrarily change. I'll worry about this once I have the basic setup working and understand it reasonably well though - that should keep me busy for a while.
Trackbacks
Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/8492 How to deal with evaluation noiseExcerpt: As I've mentioned before, I'm running into problems of evaluation noise with my runs, and the most obvious solution (give each agent more trials) is too costly to be practical. I do have a few ideas of what to do about this, which I'll be experimenting...
Weblog: Eldan Goldenberg's lab notebook
Tracked: July 20, 2006 08:04 PM

Comments