search

About my work

UPDATE: This post is now obsolete. I'm in a transitional period right now, figuring out what to do next, and when I have a more detailed research statement I'll replace this post with it. Till then, here's a more recent post that brings things up to date.

Before I start in with the weekly updates, I realise they do need some context to make any sense, so here is a brief description of the overall project that I'm hoping will be the bulk of my PhD work, and some of the techniques that I [and my lab - none of these were invented by me] use. That is what follows; at this point it's fairly poorly structured, but within a month or two there will be a better-organised version serving as the introduction to a paper.

The motivating interest behind this project is to understand the evolution of learning, and the interaction between an environment and the agents that exist in it. Specifically, I'm interested in the conditions that caused organisms to evolve the capacity to learn. I should probably start with a few suggestions as to why this might be a problem worth understanding (beyond intellectual curiosity, which is an important reason but could just as easily have motivated me to do something like these projects).

There are three things I'm hoping to get out of this. It may shed some light on the evolutionary history of learning (major caveat: all a simulation can ever teach us about evolution is that a certain path was possible), the process should teach me something about how the actual learning works, and ultimately an understanding of how to produce artificial agents that learn may be applicable to many engineering problems. I'll pick up on that last point, because it's the clearest and least speculative one.

At the moment, there's a lot of AI work on one-time learning: agents that start out with the ability to learn something, and then learn over an epoch, and are then frozen in their learned state. I'm cariaturing somewhat here—there are plenty of artificial systems that continue to adapt to environmental fluctuations over time—but I do think that most of what we call artificial learning systems do something that biologists would call "development" rather than "learning". The important distinction is the ability to unlearn and relearn things through the lifespan, rather than having a sort of critical period which has to be when the learning is done.

There must be many other ways to do this, but I'm using a combination of techniques that I have some experience with, and some reason to expect to be well suited to the problem at hand. I'm working entirely in simulations, which helps to keep data analysable, to keep noise under control, and to run many experiments in a limited time. Within the simulation, I'm using a genetic algorithm (GA) to set the parameters for agents that are controlled by continuous-time recurrent neural networks (CTRNNs).

The concept behind a GA is simple enough: it's a kind of artificial evolution, in which a population of individuals that can be described with something analogous to genes is repeatedly tested on some task, then some of its members are chosen to reproduce (with higher-scoring individuals preferred), and the descendants are produced by mutating these chosen individuals. Over time the GA searches the space of possible individuals, finding ones that score progressively higher on the fitness test.

CTRNNs are a specialised class of neural networks, which in turn are a computational model inspired by and loosely based on biological neurons. Each unit, or 'neuron' in the network is described by a few internal parameters, and a list of connection weights which determine how strong its connections to the other units in the network are. It has an activation value (loosely analogous to a biological neuron's firing rate), which is modified over time by the combined effects of inputs (from sensors and/or other neurons) and decay (in the absence of input the activation value tends towards 0). Some neurons are designated as output units, and their activation values are the output of the network. For present purposes, the important things about CTRNNs are that they can be entirely specified by a set of numbers (which can be used as the genotype in a GA), and in principle they can emulate any system with smooth dynamics.

The simulations I am using for my experiment are extremely simple. I'm looking for the simplest possible systems that will exhibit something I would recognise as learning, partly to make this analytically tractable, and partly because that simple system must be a first step on the way to anything more complex. This also ties in to the point I made above about using simulations because they are faster than physical experiments: that's only true if the simulation is a significant simplification of the real world, and the simpler the simulation the more times I can run it in a given period on a given computer.

In my simulations, an agent is a point body, to which food is routinely presented. In each iteration (one trial), food may or may not be present, and may or may not be good, and the agent can either bite or not bite, based on the output of its controller. The agent has an energy level, which is increased by biting when good food is present, decreased by biting at any other time, and gradually decreases if the agent does nothing. Agents are scored by taking the average of their energy level across every trial to which they are exposed, and this is their fitness.

The whole population of agents is presented with the same set of trials, after which the chance of each agent being a parent is proportional to its fitness relative to the others in the population. In effect, agents are selected for their success at selectively biting only when there is good food present, but always biting when it is.

The learning comes in from what information is available to the agent to guide its behaviour. My research consists mainly of manipulating that as the independent variable, and analysing the behaviour of the agents that evolve as the dependent variable. There are some trivial cases—if there's no reliable information, or if the information is constant between sets of trials—that I know do not support any sort of learning (in the former there's nothing to learn, and in the latter it's much easier and more reliable for agents to be hardwired than to learn), and I'm trying to find the interesting cases in between those extremes. My thesis is that learning arises in response to environments that fall in the right range along that spectrum of information. [note to self: the previous sentence needs to be re-written without the clichés]

I have an overall roadmap of different sorts of information I want to try, but I'll go into that later. For now I'll just explain how my current set of experiments works.

Right now I am testing a version of the general setup I described above, in which the information available to the agent is in the form of temporal correlations. The agent's only meaningful inputs are its own energy level, and a binary sensor that simply declares whether or not food is present (regardless of food type). There are also 3 other sensory neurons (included so that I can use the same controller architecture in other experiments), but they are fed random noise in these particular experiments, so they convey no information at all.

Each set of trials is described by a small number of parameters. There is the probability that food will be present in any given trial (I've been leaving this at 1 for the time being), the noise vector to present to the agent's input neurons (this is equivalent to the noise being pre-recorded, to make sure that every agent in the same generation gets an identical set of stimuli), and a time period. The food is always good in the first trial of a set, and then the period determines how many trials elapse until the food switches to bad, and then how many until it switches back to good, and so on. I was hoping that agents would use this information to tune their own behaviour to an oscillation between "bite" and "don't bite" that matches the time period.

So far, I haven't found an agent that does what I was hoping for, but this is an ongoing project, there wouldn't have been any point trying if I were certain it would work, would there?

Trackbacks

Trackback URL for this entry is: http://blog.case.edu/exg39/mt-tb.cgi/5320

Comments

Post a comment