search

More working agents

This week I only have fairly general things to say, but what I do have is encouraging.

Since last updating, I've focussed more on looking at the most recent runs as a group than on going into great detail on any one individual. This means that I don't have answers to the questions I was asking last week about why the individual agent I had singled out behaves quite as it does, but I have been able to find out that this agent is not quite as rare as I initially thought, which is quite reassuring really.

From a total of 30 shaping runs, I've found 5 agents that meet the following criteria:

When tested on a freshly generated batch of 500 trialsets in which the value of both good-food and bad-food flags is allowed to vary over the whole [-1,+1] range, the average performance is >0.8 , and when individual runs from that batch are examined the agent appears to be behaving in a way that I can explain reasonably.
Qualitatively, these 5 all do appear to be learning the associations with the flags freshly each run.

Meanwhile, none of the non-shaping runs have yielded anything of interest. On a personal level, I'm satisfied with the evidence that shaping is crucial for this task, but I think I would want to increase n before making that claim in a paper.

I'm not yet in a position to point to a detailed combination of parameters that allows good agents to evolve, but the one thing I'm seeing a consistent effect with is the starting range of flags. The only starting range that works is non-overlapping, but touching, in other words if one flag starts out as 0<=flag<=1 and the other as -1<=flag<=0 . Starting with a gap between the flags (e.g. 0.5<=flag<=1 and -1<=flag<=-0.5) very quickly yields agents that do well on the starting conditions, but then they fail to adapt once the flags start to overlap. On the other hand, starting with some overlap simply fails to produce agents that score highly at any point.

I do need a larger n before I can say any of this with any authority, however, and to that end I'm repeating all the experiments in the last batch, with fresh random seeds. Meanwhile I need to start looking at individual agents in more detail. They all seem to show the pattern of doing well on almost all trials, and failing catastrophically on a few, so if I can find regularities in which trials they fail on that should give me a lead into understanding exactly what they are doing.

I'm now in Canada, and will be on vacation for this week, but next week I'll be back at work, analysing data, and I won't be taking any more vacation from then till September.

Trackbacks

Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/6528

Comments

Post a comment