[Epistemic status: waiting for jobs to finish on the cluster, so I am writing down random ideas. Take it as speculation.]
Genetic algorithms are a pretty cool idea: when you have an objective function, you can optimize it by starting with some random guessing and then use mutations and cross-over to improve your guess. At each generation, you keep the most promising examples and eventually you will converge on a good solution. Just like evolution does.
Unfortunately, in practice, this idea does not pan out: genetic algorithms are not that effective. In fact, I am not aware of any general problem area where they are considered the best option. For example, in machine learning, the best methods tend to be variations of stochastic gradient descent.
And, yet, evolution uses genetic algorithms. Why doesn’t evolution use stochastic gradient descent or something better than genetic algorithms?
What would evolutionary gradient descent even look like?
First of all, let’s assure ourselves that we are not just using vague analogies to pretend we have deep thoughts. Evolutionary gradient descent is at least conceptually possible.
To be able to do gradient descent, a bacterium reproducing would need two pieces of information to compare itself to its mother cell: (1) how does it differ in genotype and (2) how much better is it doing than its parent. Here is one possible implementation of this idea: (1) tag the genome where it differs from the the mother cell (epigenetics!) and (2) have some memory of how fast it could grow. When reproducing, if we are performing better than our mother, then introduce more mutations in the regions where we differ from mother.
Why don’t we see something like this in nature? Here are some possible answers
Almost all mutations are deleterious
Almost always (viruses might be an exception), higher mutation rates are bad. Even in a controlled way (just the regions that seem to matter), adding more mutations will make things worse rather than better.
The signal is very noisy
Survival or growth is a very noisy signal of how good a genome is. Maybe we just got luckier than our parents in being born at a time of glucose plenty. If the environment is not stable, over reacting to a single observation may be the wrong thing to do.
The relationship between phenotype and genotype is very tenuous
What we’d really like to do is something like “well, it seems that in this environment, it is a good idea if membranes are more flexible, so I will mutate membrane-stiffness more”. However, the relationship between membrane stiffness and the genotype is complex. There is no simple “mutate membrane-stiffness” option for a bacterium. Epistatic effects are killers for simple ideas like this one.
On the other hand, the relationship between any particular weight in a deep CNN and the output is very weak. Yet, gradient descent still works there.
The cost of maintaining the information for gradient descent is too high
Perhaps, it’s just not worth keeping all this accounting information. Especially because it’s going to be yet another layer of noise.
Maybe there are examples of natural gradient descent, we just haven’t found them yet
There are areas of genomes that are more recombination prone than others (and somatic mutation in the immune system is certainly a mechanism of controlled chaos). Viruses may be another entity where some sort of gradient descent could be found. Maybe plants with their weird genomes are using all those extra copies to transmit information across generations like this.
As I said, this is a post of random speculation while I wait for my jobs to finish….