{1567} revision 5 modified: 04-21-2022 18:22 gmt |

Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution - Central hypothesis is that simplicity and symmetry arrive not through natural selection, but because these form are overwhelmingly represented in the genotype-phenotype map
- Experimental example here was "polyominoes", where there are N=16 tiles, each with a 4 numbers (encoded with e.g. 6-bit binary numbers). The edge numbers determine how the tiles irreversibly bind, e.g. 1 <-> 2, 3 <-> 4 etc, with 4 and 2^6-1 binding to nothing.
- These tiles are allowed to 'randomly' self-assemble. Some don't terminate (e.g. they form continuous polymers); these are discarded; others do terminate (no more available binding sites).
- They assessed the complexity of both polyominoes selected for a particular size, eg 16 tiles, or those not selected at all, other than terminating.
- In both complexity was assessed based on how many actual interactions were needed to make the observed structure. That is, they removed tile edge numbers and kept it if it affected the n-mer formation.
- Result was this nice log-log plot:
- Showed that this same trend holds for protein-protein complexes (weaker result, imho)
- As well as RNA secondary structure
- And metabolic time-series in a ODE modeled on yeast metabolism (even weaker result..)
The paper features a excellent set of references, including: - Deep learning generalizes because the parameter-function map is biased towards simple functions
- Input–output maps are strongly biased towards simple outputs
- This is the source of their formula 1:
- $P(x) \leq 2^{-a \tilde{K}(x) - b}$ where $x$ is a phenotype (like polyomino topology or protein heteromer), $P(x)$ is the probability of finding that phenotype with random sampling, $\tilde{K}(x)$ is the (approximate) Kolmogorov complexity of that output, and a and b are constants of a particular genotype-phenotype map.
- Note: same last author (A.A. Louis) on all these papers)
Letter to a friend following her article Machine learning in evolutionary studies comes of age Read your PNAS article last night, super interesting that you can get statistical purchase on long-lost evolutionary 'sweeps' via GANs and other neural network models. I feel like there is some sort of statistical power issue there? DNNs are almost always over-parameterized... slightly suspicious. This morning I was sleepily mulling things over & thought about a walking conversation that we had a long time ago in the woods of NC: Why is evolution so effective? Why does it seem to evolve to evolve? Thinking more -- and having years more perspective -- it seems almost obvious in retrospect: it's a consequence of Bayes' rule. Evolution finds solutions in spaces that have overwhelming prevalence of working solutions. The prior has an extremely strong effect. These representational / structural spaces by definition have many nearby & associated solutions, hence appear post-hoc 'evolvable'. (You probably already know this.) I think proteins very much fall into this category: AA were added to the translation machinery based on ones that happened to solve a particular problem... but because of the 'generalization prior' (to use NN parlance), they were useful for many other things. This does not explain the human-engineering-like modularity of mature evolved systems, but maybe that is due to the strong simplicity prior [1] Very very interesting to me is how the science of evolution and neural networks are drawing together, vis a vis the lottery ticket hypothesis. Both evince a continuum of representational spaces, too, from high-dimensional vectoral (how all modern deep learning systems work) to low-dimensional modular, specific, and general (phenomenological human cognition). I suspect that evolution uses a form of this continuum, as seen in the human high-dimensional long-range gene regulatory / enhancer network (= a structure designed to evolve). Not sure how selection works here, though; it's hard to search a high-dimensional space. The brain has an almost identical problem: it's hard to do 'credit assignment' in a billions-large, deep and recurrent network. Finding which set of synapses caused a good / bad behaviior takes a lot of bits. |