Why does natural evolution use genetic algorithms when it’s not a very good optimization method?

[Epistemic status: waiting for jobs to finish on the cluster, so I am writing down random ideas. Take it as speculation.]

Genetic algorithms are a pretty cool idea: when you have an objective function, you can optimize it by starting with some random guessing and then use mutations and cross-over to improve your guess. At each generation, you keep the most promising examples and eventually you will converge on a good solution. Just like evolution does.

Unfortunately, in practice, this idea does not pan out: genetic algorithms are not that effective. In fact, I am not aware of any general problem area where they are considered the best option. For example, in machine learning, the best methods tend to be variations of stochastic gradient descent.

And, yet, evolution uses genetic algorithms. Why doesn’t evolution use stochastic gradient descent or something better than genetic algorithms?

What would evolutionary gradient descent even look like?

First of all, let’s assure ourselves that we are not just using vague analogies to pretend we have deep thoughts. Evolutionary gradient descent is at least conceptually possible.

To be able to do gradient descent, a bacterium reproducing would need two pieces of information to compare itself to its mother cell: (1) how does it differ in genotype and (2) how much better is it doing than its parent. Here is one possible implementation of this idea: (1) tag the genome where it differs from the the mother cell (epigenetics!) and (2) have some memory of how fast it could grow. When reproducing, if we are performing better than our mother, then introduce more mutations in the regions where we differ from mother.

Why don’t we see something like this in nature? Here are some possible answers

Almost all mutations are deleterious

Almost always (viruses might be an exception), higher mutation rates are bad. Even in a controlled way (just the regions that seem to matter), adding more mutations will make things worse rather than better.

The signal is very noisy

Survival or growth is a very noisy signal of how good a genome is. Maybe we just got luckier than our parents in being born at a time of glucose plenty. If the environment is not stable, over reacting to a single observation may be the wrong thing to do.

The relationship between phenotype and genotype is very tenuous

What we’d really like to do is something like “well, it seems that in this environment, it is a good idea if membranes are more flexible, so I will mutate membrane-stiffness more”. However, the relationship between membrane stiffness and the genotype is complex. There is no simple “mutate membrane-stiffness” option for a bacterium. Epistatic effects are killers for simple ideas like this one.

On the other hand, the relationship between any particular weight in a deep CNN and the output is very weak. Yet, gradient descent still works there.

The cost of maintaining the information for gradient descent is too high

Perhaps, it’s just not worth keeping all this accounting information. Especially because it’s going to be yet another layer of noise.

Maybe there are examples of natural gradient descent, we just haven’t found them yet

There are areas of genomes that are more recombination prone than others (and somatic mutation in the immune system is certainly a mechanism of controlled chaos). Viruses may be another entity where some sort of gradient descent could be found. Maybe plants with their weird genomes are using all those extra copies to transmit information across generations like this.

As I said, this is a post of random speculation while I wait for my jobs to finish….

Advertisements

The Scientific Paper of the Future is Probably a PDF

I do not mean to say the scientific paper of the future should be a PDF, I just mean that it will mostly likely be a PDF or some PDF-derived format. By future, I mean around 2040 (so, in 20-25 years).

I just read James Somers in the Atlantic, arguing that The Scientific Paper Is Obsolete (Here’s what’s next). In that article, he touts Mathematica notebooks as a model of what should be done and Jupyter as the current embodiment of this concept.

I will note that Mathematica came out in 1988 (a good 5 years before the PDF format) and has yet failed to take the world by storm (the article claims that “the program soon became as ubiquitous as Microsoft Word”, a claim which is really hard to reconcile with reality). Perhaps Mathematica was held back because it’s expensive and closed source (but so is Microsoft Word, and Word has taken the world by storm).

How long did it take to get to HTML papers?

For a very long time, the future of the scientific paper was going to be some smart version of HTML. We did eventually get to the point where most journals have decent HTML versions of their papers, but it’s mostly dumb HTML.

As far as I can tell, none of the ideas of having a semantically annotated paper panned out. About 10 years ago, the semantic web was going to revolutionize science. That didn’t happen and it’s even been a while since I heard someone arguing that that would be the future of the scientific paper.

Tools like Read Cube or Paperpile still parse the PDFs and try to infer what’s going on instead of relying on fancy semantic annotations.

What about future proofing the system?

About a week ago, I tweeted:

This is about a paper which is now in press. It’s embargoed, but I’ll post about it when it comes out in 2 weeks.

I have complained before about the lack of backwards compatibility in the Python ecosystem. I can open and print a PDF from 20 years ago (or a PostScript file from the early 1980s) without any issues, but I have trouble running a notebook from last year.

At this point, someone will say docker! and, yes, I can build a docker image (or virtual machine) with all my dependencies and freeze that, but who can commit to hosting/running these over a long period? What about the fact that even tech-savvy people struggle to keep all these things properly organized? I can barely get co-authors to move beyond the “let’s email Word files back and forth.”

With less technical co-authors, can you really imagine them downloading a docker container and properly mounting all the filesystems with OverlayFS to send me back edits? Sure, there are a bunch of cool startups with nicer interfaces, but will they be here in 2 years (let alone 20)?

Is it even a good idea to have the presentation of the results mixed with their computation?

I do see the value in companion Jupyter notebooks for many cases, but as a replacement for the main paper, I am not even sure it is a good idea.

There is a lot of accidental complexity in code. A script that generates a publication plot may easily have 50 lines that do nothing more than set up the plot just right: (1) set up the subplots, (2) set x- and y-labels, (3) fix colours, (4) scale the points, (5) reset the font sizes, &c. What value is there in keeping all of this in the main presentation of the results?

Similarly, all the file paths and the 5 arguments you need to pass to pandas.read_table to read the data correctly: why should we care when we are just trying to get the gist of the results? One of our goals in NGLess is to try to separate some of this accidental complexity from the main processing pipeline, but this also limits what we can do with it (this is the tradeoff, it’s a domain specific tool; it’s hard to achieve the same with a general purpose tool like Jupyter/Python).

I do really like Jupyter for tutorials as the mix of code and text are a good fit. I will work to make sure I have something good for the students, but I don’t particularly enjoy working with the notebook interface, so I need to be convinced before I jump on the bandwagon more generally.

§

I actually do think that the future will eventually be some sort of smarter thing than simulated paper, but I also think that (1) it will take much longer than 20 years and (2) it probably won’t be Jupyter getting us there. It’s a neat tool for many things, but it’s not a PDF killer.

How NGLess uses its version declaration

NGLess is my metagenomics tool, which is based on a domain specific language. So, NGLess is both a language and a tool (which implements the language).

Since the beginning, ngless has had a focus on reproducibility and one the small ways in which this was implemented was that ngless requires a version declaration. Every ngless script is required to start with a version declaration:

    ngless "0.5"

This was always intended to enable the language to change while keeping perfect reproducibility of past scripts. Until recently, though, this was just hypothetical.

In October, I taught a course on NGLess and it became clear that one of the minor inconsistencies in the previous version of the language (at the time, version “0.0”) was indeed confusing. In the previous version of the language, the preprocess function modified its arguments. No other function did this.

In version “0.5” (which was released on November 1st), preprocess is now a pure function, so that you must assign its output to a value.

However, and this is where the version declaration comes into play, the newer executable still accepts scripts with the version declaration ngless "0.0". Furthermore, if you declare your script as using ngless 0.0, then the old behaviour is used. Thus, we fixed the language, but nobody needs to update their scripts.

Implementation note (which shouldn’t concern the user, but may be interesting to others): before interpretation, ngless will transform the input script, adding checks and optimizing it. A new pass (which is only enabled is the user requested version “0.0”), simply transforms the older code into its newer counterpart. Then, the rest of the process proceeds as if the user had typed in the newer version.

When you say you are pro-science, what do you mean you are in favor of?

In the last few weeks, with the March for Science coming up, there have been a few discussion of what being pro-science implies. I just want to ask

When you say you are pro-science, what do you mean you are in favor of?

Below, I present a few different answers.

Science as a set of empirically validated statements

Science can mean facts such as:

  • the world is steadily warming and will continue to do as CO2 concentrations go up
  • nuclear power is safe
  • vaccines are safe
  • GMOs are safe

This is the idea behind the there is no alternative to facts rhetoric. The four statements above can be quibbled with (there are some risks to some vaccines, GMO refers to a technique not a product so even calling it safe or unsafe is not the right discussion, nuclear accidents have happened, and there is a lot of uncertainty on both the amount of warming and its downstream effects), but, when understood in general terms, they are facts and those who deny them, deny reality.

When people say that science is not political, they mean that these facts are independent of one’s values. I’d add the Theory of Evolution to the above four, but evolution (like Quantum Mechanics or Relativity) is even more undeniable.

Science and technology as a positive force

The above were “value-less facts”; let’s slowly get into values.

The facts above do not have any consequences for policy or behaviour on their own. They do constrain the set of possible outcomes, but for a decision, you need a set of values on top.

It’s still perfectly consistent with the facts and claim the following: Vaccines are safe, but a person’s bodily autonomy cannot be breached in the name of utilitarianism. In the case of children, the parents’ autonomy should be paramount. This is a perfectly intellectually consistent libertarian position. As long as you are willing to accept that children will die as a consequence, then I cannot really say you are denying the scientific evidence. This may seem a shocking trade-off when said out loud but it also happens to be the de facto policy of the Western world for the 10-20 past years: vaccines are recommended, but most jurisdictions will not enforce them anymore.

Similar statements can be made about all of the above:

  • The world is getting warmer, but fossil fuels bring human beings wealth and so, are worth the price to the natural environment. The rest should be dealt with mitigation and geo-engineering. What is important is finding the lowest cost solution for people.
  • Nuclear power is safe, but storing nuclear waste destroys pristine environments and that is a cost not worth paying.
  • GMOs are safe, but messing with Nature/God’s work is immoral.

Empirical facts can provide us with the set of alternatives that are possible, but do not help us weigh alternatives against each other (note how often cost/benefit shows up in the above, but the costs are not all material costs). Still, often being pro-science is understood as being pro technological progress and, thus, anti-GMO or anti-nuclear activism is anti-science.

Science as a community and set of practices

This meaning of “being pro-Science”, science as the community of scientists, is also what leads to views such as being pro-Science means being pro-inclusive Science. Or, on the other side, bringing up Dr. Mengele.

Although it is true that empirically validated facts are shared across humanity, there are areas of knowledge that impact certain people more than others. If there is no effort to uncover the mechanisms underlying a particular disease that affect people in poorer parts of the world, then the efforts of scientists will have a differential impact in the world.

Progress in war is fueled by science as much as progress in any other area and scientists have certainly played (and continue to play) important roles in figuring out ways of killing more people faster and cheaper.

The scientific enterprise is embedded in the societies around it and has, indeed, in the past resorted to using slaves or prisoners. Even in the modern enlightened world, the scientific community has had its share of unethical behaviours, in ways both big and small.

To drive home the point: does supporting science mean supporting animal experiments? Obviously, yes, if you mean supporting the scientific enterprise as it exists. And, obviously, no, if it means supporting empirically validated statements!

The cluster of values that scientists typically share

Scientists tend to share a particular set of values (at least openly). We are pro-progress (in technological and social sense), socially liberal, cosmopolitan, and egalitarian. This is the view behind science is international and people sharing photos of their foreign colleagues on social media.

There is nothing empirically grounded of why these values would be better than others, except that they seem to be statistically more abundant in the minds of professional scientists. Some of this may really be explained by the fact that open minded people will both like science and share this type of values, but a lot of it is more arbitrary. Some of it is selection: given the fact that the career mandates travel and the English language, there is little appeal to individuals who prefer a more rooted life. Some of it is socialization (spend enough time in a community where these values are shared and you’ll start to share them). Some of it is preference falsification (in response to PC, people are afraid to come out and say what they really believe).

In any case, we must recognition that there is no objective sense in which these values are better than the alternative. Note that I do share them. If anything, their arbitrariness is often salient to me because I am even more cosmopolitan than the average scientist, so I see how the barrier between the “healthy nationalism” that is accepted and the toxic variety is a pretty arbitrary line in the sand.

What is funny too is that science is often funded exactly for the opposite reasons: It’s a prestige project for countries to show themselves superior to others, like funding the arts, or the Olympics team. (This is not the only reason to fund science, but it is certainly one of the reasons). You also hear it in Science is what made America great.

Science as an interest group

Science can be an interest group like any other: we want more subsidies & lower taxes (although there is little room for improvement there: most R&D is already tax-exempt). We want to get rid of pesky regulation, and the right to self-regulate (even though there is little evidence that self-regulation works). Science is an interest group.

Being pro-science

All these views of “What do I mean when I am pro-science?” interact and blend into each other: a lot of the resistance to things like GMOs does come from an empirically wrong view of the world and correcting this view thus assuage concerns about GMOs. Similarly, if you accept that science generally results in good things, you will be more in favor of funding it.

Sometimes, though, they diverge. The libertarian view that mixes a strong empiricism and defense of empirically validated facts with an opposition to public funding of science is a minority overall, but over-represented in certain intellectual circles.

On the other hand, I have met many people who support science as a force for progress and as an interest group, but who end up defending all sorts of pseudo-scientific nonsense and rejecting the consensus on the safety of nuclear power or GMOs. This is why I work at a major science institution whose health insurance covers homeopathy: the non-scientific staff will say they are pro-science, but will cherish their homeopathic “remedies”. I also suspect that many people declare themselves as pro-science because they see it as their side versus the religious views they disagree with, even though you can perfectly well be religious and pro-science in accepting the scientific facts.  I would never claim that Amish people are pro-progress and I hazard no guess on their views on public-science funding, but many are happy to grow GMOs as they accept the empirical fact of their safety. In that sense, they are more pro-science than your typical Brooklyn hipster.

Sometimes, these meanings of being pro-science blend into each other by motivated reasoning. So, instead of saying that vaccines are so overwhelmingly safe and that herd immunity is so important that I support mandating them (my view), I can be tempted to say “there is zero risk from vaccines” (which is not true for every vaccine, but I sure wish it were). I can be tempted to downplay the uncertainty about the harder-to-disentangle areas of economic policy and cite the empirical studies that agree with my ideology, and to call those who disagree “anti-scientific.” I might deny that values even come into play at all. We like to pretend there are no trade-offs. This is why anti-GMO groups often start by discussing intellectual property and land-use issues and end up trying to shut down high-school science biology classes.

In an ideal world, we’d reserve the opprobrium of “being anti-science” for those who deny empirical facts and well-validated theories, while discussing all the other issues as part of the traditional political debates (is it worth investing public money in science or should we invest more in education and new public housing? or lowering taxes?). In the real world, we often borrow credibility from empiricism to support other values. The risk, however, is that, what we borrow, we often have to pay back with interest.

Haven’t They Suffered Enough?

Haven’t They Suffered Enough?

Every time I read about a plan to have more women and minority in science careers, I think of that famous New Yorker quip about gays getting marriedGays getting married? Haven’t they suffered enough?

Women in tenure-track positions? Haven’t they suffered enough?

§

I read this lament yesterday:

I was the lucky kid who never had to study for tests. I always scored in the 99% percentile on the annual state assessments.

[… Now I don’t make that much money.]

The national average at the time was that for every one faculty position, there were 200 applications. For our department, there were 300 applications for every one faculty position

[…]

Science will fail because the System is running the scientists out of it.

This is like nobody goes there anymore, it’s too crowded. In one sense it expresses a truth, but it is actually non-sensical.

The problem with science cannot simultaneously be that scientists are not sufficiently paid and that there are too many of them for the same position. And, if you argue that too many scientists are leaving academia, you also need to explain how this fits in with all the other complains about academia that focus on how hard it is to get a job.

§

If you want to make the argument that there should be more science funding, go ahead; I’ll support you 100%.

If you want to make the argument that postdoc salaries are so low that it’s hard to get a qualified candidate, go ahead; I’ll mostly disagree.

If you want to make the argument that the current system leads to sub-optimal science, go ahead, I might support or disagree depending on the details. In the comments to that article someone points out that in the current system PIs are incentivized to be overly conservative and focused on the short-term unlike the private sector which has a longer time-horizon (and perhaps more tolerance for failure). This sort of argument is much more interesting as it implies that there could be better mechanisms for funding.

§

But, reading these poor me laments, I actually conclude that the taxpayer is getting a great deal: it gets very smart people working 80 hour days for so little money that they cannot afford to go to the movies[1] and they even produce a lot of nice results. Man, your tax dollars are hard at work!

The goal of public science funding is to get as much science as possible. Scientists are a cost to the public to be minimized. It seems that this is working pretty well.

Can we structure the rest of the public sector to be like this? [2] We’d get excellent public services for much lower taxes (we could surely lower the Council Tax which seems to take such a big chunk of this poor fellow’s salary).

[1] I have to say I don’t fully believe that this guy has it this bad.
[2] Joking aside, I actually think that science funding is, in general, better than other types of funding at getting bang-for-public-buck. Tenure comes late in your career (and it is not enough to sit on your ass and not get fired for 2 years), the grant system is competitive, &c In spite of the fact that public funding dominates, very few people would argue that there is no competition in science.

Friday Links

1. On Schekman’s pledge to not publish high-profile. I almost called this a balanced view, but then realized that I probably used that phrase to refer to Derek Lowe’s work at least twice in the past. The man is smart and balanced, what can I say?

2. An interesting meeting report (closed access, sorry). Two highlights:

While discussing mutations that predispose to cancer, Nazneen Rahman (Institute of Cancer Research, UK) rightly reminded us that people make big decisions and have parts of their anatomy removed based on their genotype.

[…]

Jeanne Lawrence (University of Massachusetts Medical School, USA) convincingly showed that her lab was able to silence one entire copy of chromosome 21 in stem cells in vitro. Trisomy 21 or Down’s syndrome is caused by an extra copy of chromosome 21. […] Lawrence and colleagues inserted XIST (human X-inactivation gene) into chromosome 21 in stem cells with trisomy 21. They then showed using eight different methods that a single copy of the chromosome had indeed been silenced.

3. A good explanation of Bitcoin, the protocol

4. Interesting article about wine & technology in The Economist (which is one of the few mainstream magazines whose science coverage is worth reading [1]).

[1] Actually, I think it’s the only one who can be consistently trusted, but I enjoy anything by Ed Yong wherever he publishes and been reading some excellent articles by Carl Zimmer in The Atlantic.

Seeing is Believing. Which is Dangerous.

One of the nice things about being at EMBL is that, if you just wait, eventually you can hear the important people in your field speak. Today, I’m quite excited about the Seeing is Believing conference

But ever since I saw this advertised, I dislike the name Seeing is Believing.

Grey_square_optical_illusion

  1. Seeing is believing. This is unquestionable.
  2. But seeing is not always justified believing. Our seeing apparatus will often lead us astray. This is especially true on images which do not look like the ones we evolved for (and grew up looking at).
  3. The fact that seeing is believing is actually often a cognitive problem which needs to be overcome!

§

I can no longer find who said it a BOSC, but someone pointed out, insightfully, that a visualization is already an interpretation of the data, it may be wrong.

More often than not, I show you a picture of a cell, this is rarely raw data. The raw data is a big pixel array. By the time I’m showing it to you I’ve done the following:

  1. Chosen an example to show.
  2. Often projected the data from 3D to a 2D representation
  3. Tweaked contrast.

Point 1 is the biggest culprit here: the selection of which cell to image and show can be an incredibly biased process (even unconsciously biased, of course).

However, even tweaks to the way that the projection is performed and to the contrast can highlight or hide important details (as someone with a lot of experience playing with images, I can tell you that there is a lot of space for “highlighting what you want to show”). In the newer methods (super-resolution type methods), this is even worse: the “picture” you see is already the output of a big processing pipeline.

§

I’m not even thinking about the effects of the tagging protocols, which introduces their own artifacts. But we, humans, often make the mistake of saying things like “this is an image of protein A in cell type B” instead of “this is an image of a chimeric protein which includes the sequence of A, with a strong promoter in cell type B”.

§

We know that these artifacts and biases are there, of course. But we believe the images. And this can be a problem because humans are not actually all that great at image analysis.

Seeing is believing, which too often means that we suspend our disbelief (or, as we scientists, like to say: we suspend our skepticism). This is not a recipe for good science.

Update: On twitter, Jim Procter (@foreveremain), points out a great example: the story of the salmon fMRI: we can see it, but we shouldn’t believe it.