Timing problems
The trip to Bloomington was very fruitful in terms of getting to meet a few people I'd previously only heard from over email, going to interesting talks, and getting feedback from my advisor. Unfortunately, the most important single thing I found out represented a substantial setback, and this is why it's taken me a while to get around to blogging about where I'm at with research.
The problem was that my advisor noticed a potentially very important mistake I had made a few months ago. Or rather—I should be more specific here—the problem is not that he noticed it, but that I hadn't noticed earlier (and it could still be much worse - at least he noticed now, and not after a paper was submitted or something).
Our simulations implement continuous-time controllers on discrete-time hardware [ordinary digital computers] by numerically approximating their behaviour. We [I use code supplied by my advisor for this part] use the Euler Method to do this, largely because it is the computationally cheapest way. The problem is that there's always a trade-off in choosing the size of integration steps: smaller steps make the simulation slower, but larger steps increase the amount of noise in the approximation. There is also a time constant for each individual unit of a CTRNN, and really it's the size of the integration steps relative to the time constants that matter. It's a fairly intuitive concept, and I discovered today that it's called the Courant-Friedrichs-Lewy Condition.
Anyway, over the past few months I've been continually tinkering with simulation parameters in an effort to get the blasted things to work reliably without taking impractical amounts of time. I thought that I had finally found some parameters that led to successful searches often enough to be useful, and while the searches were still annoyingly long (at about 5 days each) they were not so long as to be useless. However, in the course of the parameter tinkering, I had managed to set the lower bound on time constants to the same value (0.1) as the Euler integration step size; a violation of the Courant-Friedrichs-Lewy Condition which potentially allows controllers to do weird things by exploiting the integration noise. Given the generally remarkable ability of evolutionary algorithms to exploit bugs in simulators, this is a serious concern.
Since the discovery of this error, all of my research activity has been directed towards trying to understand if and how much it really matters. Thus far, I have contradictory evidence:
The case for the prosecution
- Running a search with an integration step size of 0.1 does not yield the same results as running a search with the same parameters except for a step size of 0.01 and the relevant time periods being multiplied by 10 to compensate.
- The searches I have run with step sizes an order of magnitude smaller than the smallest allowed time constant have so far been utterly fruitless, suggesting that exploitation of the integration noise has been necessary in some way for the searches to succeed.
- The best agent found by searches with a step size of 0.1—which is also the one I've spent the most time analysing, for obvious reasons—has two CTRNN units with a time constant of exactly 0.1, so its behaviour is liable to be affected by integration noise. To make things worse, one of those two units is the interneuron with the most obvious role in its behaviour.
The case for the defence
- In spite of the above, that controller's performance is barely changed—the fitness score differs only in the third decimal place—when I run it with integration step sizes even two orders of magnitude smaller. This suggests that it's not doing anything that would depend on integration noise to work.
- The same controller's behaviour, while qualitatively somewhat different with different step sizes, is not dramatically changed. The changes in units' outputs are a little more gradual, but the agent makes almost exactly the same decisions about which food to eat and which to ignore, again suggesting that the difference doesn't matter.
Conclusion?
The jury's still out on this one. Clearly I do have a problem, if the outcome of evolutionary searches is dependent on a parameter being set outside the range it should be in. But on the other hand, if this is important then why does the best agent work just fine without that being the case? One possibilty is that there's a necessary intermediate stage in which agents do 'abuse' the integration noise; this is going to take a lot of work to find.
To do
There are three things I'm working on in order to figure out what's going on here:
- Run more searches with appropriate step sizes - because these are even slower than those with inappropriately large integration steps, I don't have enough yet to rule out the possibility that only bad luck has stopped me from yet seeing a successful search this way.
- Analyse more agents from the old batch, to see if I do find any that abuse the integration noise.
- Look at many of the agents that were saved during the search that found the best agent so far, to see if the search arrived at that good agent via abuse of integration noise.

Comments
#3 is obviously the most difficult of the three items on your "to do" list, but it's probably the best way to make lemonade out of these lemons.
Setting the time constant equal to a neuron's tau causes the neuron's state to follow exactly its inputs when using Euler. Not that you don't know this, but
t/tau = 1 (here, I'm using t as the timestep)
s[i+1] = s[i] + 1*(I[i] - s[i]) = I[i]
(here, I'm using i to represent an arbitrary sampling instant, not the index of a neuron in an array, due to my unfortunate choice of "t" as the timestep)
The Z-transform of this indicates that it has one pole at z = 0, which is stable (actually, it's critically damped), but for numerical integration purposes, is (as you know) cutting it really close.
The possible saving grace is that the state of a neuron is not what is observed; rather, the output is seen by other neurons. The state is compressed into the range (0,1), which can limit the impact that this problem will have. Also, the neurons are all integrators, making them low-pass filters, where the frequency response is related to their time constant. If you have an interneuron looking like it's doing PWM because t = tau, then essentially, it will function like PWM at the input of a neuron with a more reasonable time constant. The signal gets integrated, or smoothed, by the other neurons.
The problems Randy's group has had in the past (myself included) with integration step size usually involves some *visible* part of the system (e.g., a robot simulation) where the effective time constant (e.g., inertia) is also very small, so the errors don't get integrated out.
This isn't to say that everything's hunky-dory. In fact, t = tau is the absolute biggest you can make t before things start getting screwy regardless of the rest of the system (the impulse response of the neuron alone when 2*tau > t > tau is convergent but underdamped, which means that the output will oscillate for a while even if all the inputs are forced to zero; as a curiosity, the system goes unstable when t > 2*tau). Depending on the entire system (network plus simulation), t = tau may still be too big, but doing that kind of analysis is over my head. ;)
As you've mentioned, the fact that you've had so much trouble evolving agents with a smaller t indicates that there is some sort of anomaly going on when t = tau. The evolutionary trajectory may pass through areas where the output neurons have their taus set to t, instead of just the interneurons, where the integrative and saturating effects of the neurons don't mitigate the problem. If that's the case, then yes, you have a serious problem, but at least it's an *interesting* one. :)
Thank you for the input! I'm still clarifying what this means and how big a problem it really is, and this working through is definitely helpful at such a stage.
Depending on the entire system (network plus simulation), t = tau may still be too big, but doing that kind of analysis is over my head.
Unfortunately I fear it's also over my head, but I think it's something I need to figure out how to do....
Well, it gets into the nonlinearity of CTRNNs and requires techniques that go beyond basic stability tests for linear systems. That's probably why nobody ever bothers doing that sort of analysis, because you can just make t small enough and never have to worry about whether there are artifacts being introduced or not (i.e., solve the problem, er, empirically).
In most cases when someone is picking an integration method for their simulation, they decide how "stiff" the system is. CTRNNs aren't particularly stiff, being big nonlinear integrators, which is why Euler works. On the other hand, the cricket robot sim required something like RK4 (as many physical simulations do), but it was still dicey at times, because the contact of the tarsus with the floor was modeled as two very stiff springs in the x and y directions (whence the ever-amusing #defined constants GROUNDSTIFFX and GROUNDSTIFFY).