Interactions between learning and evolution
- Ran simulated evolution and learning on a population of agents over ~100k lifetimes.
- Each agent can last several hundred timesteps with a gridworld like environment.
- Said gridworld environment has plants (food), trees (shelter), carnivores, and other agents (for mating)
- Agent behavior is parameterized by an action network and a evaluation network.
- The action network transforms sensory input into actions
- The evaluation network sets the valence (positive or negative) of the sensory signals
- This evaluation network modifies the weights of the action network using a gradient-based RL algorithm called CRBP (complementary reinforcement back-propagation) which reinforces based on the temporal derivative, and complements (negative) when action does not increase reward, with some e-greedy exploration.
- It's not perfect, but as they astutely say, any reinforcement learning algorithm involves some search, so generally heuristics are required to select new actions in the face of uncertainty.
- Observe that it seems easier to make a good evaluation network than action network (evaluation network is lower dimensional -- one output!)
- Networks are implemented as one-layer perceptrons (boring, but they had limited computational resources back then)
- Showed (roughly) that in winner populations you get:
- When learning is an option, the population will learn, and with time this will grow to anticipation / avoidance
- This will transition to the Baldwin effect; learned behavior becomes instinctive
- But, interestingly, only when the problem is incompletely solved!
- If it's completely solved by learning (eg super fast), then there is no selective leverage on innate behavior over many generations.
- Likewise, the survival problem to be solved needs to be stationary and consistent for long enough for the Baldwin effect to occur.
- Avoidance is a form of shielding, and learning no longer matters on this behavior
- Even longer term, shielding leads to goal regression: avoidance instincts allow the evaluation network to do something else, set new goals.
- In their study this included goals such as approaching predators (!).
Altogether (historically) interesting, but some of these ideas might well have been anticipated by some simple hand calculations. |