Deep Neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning* Uber AI labs; Jeff Clune.
- In this paper, they used a (fairly generic) genetic algorithm to tune the weights of a relatively large (4M parameters) convolutional neural net to play 13 atari games.Â
- The GA used truncation selection, population of ~ 1k individuals, no crossover, and gaussian mutation.
- To speed up and streamline this algo, they encoded the weights not directly but as an initialization seed to the RNG (log2 of the number of parameters, approximately), plus seeds to generate the per-generation mutation (~ 28 bits). This substantially decreased the required storage space and communication costs when running the GA in parallel on their cluster; they only had to transmit the rng seed sequence.Â
- Quite surprisingly, the GA was good at typically 'hard' games like frostbite and skiing, whereas it fared poorly on games like atlantis (which is a fixed-gun shooter game) and assault.Â
- Performance was compared to Deep-Q-networks (DQN), Evolutionary search (which used stochastic gradient approximates), Asynchronous Advantage Actor-critic (A3C), and random search (RS)
-
- They surmise that some games were thought to be hard, but are actually fairly easy, albeit with many local minima. This is why search around the origin (near the initialization of the networks, which was via the Xavier method) is sufficient to solve the tasks.
- Also noted that frequently the GA would find individuals with good performance in ~10 generations, further supporting the point above.Â
- The GA provide very consistent performance across the entirety of a trial, which, they suggest, may offer a cleaner signal to selection as to the quality of each of the individuals (debatable!).
- Of course, for some tasks, the GA fails woefully; it was not able to quickly learn to control a humanoid robot, which involves mapping a ~370-dimensional vector into ~17 joint torques. Evolutionary search was able to perform this task, which is not surprising as the gradient here should be smooth.
The result is indeed surprising, but it also feels lazy -- the total effort or information that they put into writing the actual algorithm is small; as mentioned in the introduction, this is a case of old algorithms with modern levels of compute. Analogously, compare Go-Explore, also by Uber AI labs, vs Agent57 by DeepMind; the Agent57 paper blithely dismisses the otherwise breathless Go-Explore result as feature engineering and unrealistic free backtracking / game-resetting (which is true..) It's strange that they did not incorporate crossover aka recombination, as David MacKay clearly shows that recombination allows for much higher mutation rates and much better transmission of information through a population. (Chapter 'Why have sex'). They also perhaps more reasonably omit developmental encoding, where network weights are tied or controlled through development, again in an analogy to biology.Â
A better solution, as they point out, would be some sort of hybrid GA / ES / A3C system which used both gradient-based tuning, random stochastic gradient-based exploration, and straight genetic optimization, possibly all in parallel, with global selection as the umbrella. They mention this, but to my current knowledge this has not been done. |