m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{1566}
hide / / print
ref: -1992 tags: evolution baldwin effect ackley artificial life date: 03-21-2022 23:20 gmt revision:0 [head]

Interactions between learning and evolution

  • Ran simulated evolution and learning on a population of agents over ~100k lifetimes.
  • Each agent can last several hundred timesteps with a gridworld like environment.
  • Said gridworld environment has plants (food), trees (shelter), carnivores, and other agents (for mating)
  • Agent behavior is parameterized by an action network and a evaluation network.
    • The action network transforms sensory input into actions
    • The evaluation network sets the valence (positive or negative) of the sensory signals
      • This evaluation network modifies the weights of the action network using a gradient-based RL algorithm called CRBP (complementary reinforcement back-propagation) which reinforces based on the temporal derivative, and complements (negative) when action does not increase reward, with some e-greedy exploration.
        • It's not perfect, but as they astutely say, any reinforcement learning algorithm involves some search, so generally heuristics are required to select new actions in the face of uncertainty.
      • Observe that it seems easier to make a good evaluation network than action network (evaluation network is lower dimensional -- one output!)
    • Networks are implemented as one-layer perceptrons (boring, but they had limited computational resources back then)
  • Showed (roughly) that in winner populations you get:
    • When learning is an option, the population will learn, and with time this will grow to anticipation / avoidance
    • This will transition to the Baldwin effect; learned behavior becomes instinctive
      • But, interestingly, only when the problem is incompletely solved!
      • If it's completely solved by learning (eg super fast), then there is no selective leverage on innate behavior over many generations.
      • Likewise, the survival problem to be solved needs to be stationary and consistent for long enough for the Baldwin effect to occur.
    • Avoidance is a form of shielding, and learning no longer matters on this behavior
    • Even longer term, shielding leads to goal regression: avoidance instincts allow the evaluation network to do something else, set new goals.
      • In their study this included goals such as approaching predators (!).

Altogether (historically) interesting, but some of these ideas might well have been anticipated by some simple hand calculations.