Catching up
I'm in a hurry because I'm going on holiday tomorrow, it's past midnight, and I've only just finished tying up some loose ends and setting a large number of experiments running on various computers, so I'm just going to copy-and-paste the update I sent to my lab recently.
I have a frustrating combination of news: I've got quite a lot done since you last heard from me, but I still don't have any positive results. I'm going to Kyiv tomorrow, and I'm going to be leaving many computers busy in my absence, but unless some of the current set of experiments are successful, I don't think I'm going to have anything worth writing a paper about in time for the SAB deadline.
I'll summarise what I've done, what is in progress, and what comes next below. But first some practicalities: don't wait for me for this week's lab meeting, because I won't be able to phone in, but I should be back home and working as normal the week after, so I'll call in for the meeting on Wednesday March 1st.
What I've done:
Temporal correlation experiments:
I feel like I have pretty much exhausted this set. With the simple shaping algorithm I've been testing, I can get agents to perform well across a fairly wide range of time periods, without ever needing to do anything more sophisticated than the stupid threshold strategies I've described before. Not a problem in itself - I need this null result to make my overall argument, once I finally do find learning in some more complex condition.
Sensory flag experiments:
Success in these has so far depended on the separation of the flags.
If there's no overlap at all between the allowed flags for good food and the allowed flags for bad food, then searches easily find good agents, but these agents don't do any learning. They just evolve fixed behaviours that work on the basis of simple distinctions like "the higher input activation always means bite".
If I try to run searches with complete overlap (i.e. any given flag could signal good or bad food with equal probability in a given trialset), it just doesn't do very well.
So I've been running some experiments with a similar shaping system to the temporal-correlation ones. At the beginning, the world has a sharp partition between flags that signal good food, and flags that signal bad. Each time the best agent's performance is greater than or equal to a threshold, the ranges of values allowed for each flag are shifted slightly, so that they overlap progressively more.
Using this shaping algorithm, I get what look numerically like quite impressive results, but looking qualitatively at the agents produced is disappointing. It seems the agents get by with pretty simple adaptations of the initial "good is higher than bad" type strategies, and chance factors allow them to score highly in spite of this.
I'm currently running a load more shaping experiments, playing with parameters some (I'm far from finished with exploring the parameter space here), but also with a slight change in the way the flags are set: I realised today that it was possible for a trial to have good and bad flags which amounted to effectively the same pattern of input neuron activations (because the difference between them was small enough to be overwhelmed by the compression at either end of the sigmoid activation function), so I've added a check in the generation of flags to avoid such overlap. I don't know how big an effect this will have had, but for the trials to which it applies it may have basically rendered the flags useless, leaving the agent unable to do better than by randomly biting.
What's next?
I have quite a few ideas of what to try after this if these experiments don't do anything very impressive. I'm just frustrated by the passage of time....
A few things that seem worth trying out:
- a cleverer shaping algorithm, along the lines of what we were using for the catcher agents.
- fixing the trials so that they include every possible combination of inputs (reduced to binary because of the sigmoid function's compression) in every generation. I'd still want to randomise the order of good food and bad food, but I'm not sure there's any benefit to randomising the flags themselves.
- using only one input neuron. I've been shying away from this because it makes the task into more of a toy problem, but if I don't have any success with harder versions I should try the simplest thing possible.
- increasing the cost of eating bad food, so that there's a larger drawback to a randomly-behaving agent.
That's all for now; I'll be in touch next week.

Comments