How NGLess uses its version declaration

NGLess is my metagenomics tool, which is based on a domain specific language. So, NGLess is both a language and a tool (which implements the language).

Since the beginning, ngless has had a focus on reproducibility and one the small ways in which this was implemented was that ngless requires a version declaration. Every ngless script is required to start with a version declaration:

    ngless "0.5"

This was always intended to enable the language to change while keeping perfect reproducibility of past scripts. Until recently, though, this was just hypothetical.

In October, I taught a course on NGLess and it became clear that one of the minor inconsistencies in the previous version of the language (at the time, version “0.0”) was indeed confusing. In the previous version of the language, the preprocess function modified its arguments. No other function did this.

In version “0.5” (which was released on November 1st), preprocess is now a pure function, so that you must assign its output to a value.

However, and this is where the version declaration comes into play, the newer executable still accepts scripts with the version declaration ngless "0.0". Furthermore, if you declare your script as using ngless 0.0, then the old behaviour is used. Thus, we fixed the language, but nobody needs to update their scripts.

Implementation note (which shouldn’t concern the user, but may be interesting to others): before interpretation, ngless will transform the input script, adding checks and optimizing it. A new pass (which is only enabled is the user requested version “0.0”), simply transforms the older code into its newer counterpart. Then, the rest of the process proceeds as if the user had typed in the newer version.


When you say you are pro-science, what do you mean you are in favor of?

In the last few weeks, with the March for Science coming up, there have been a few discussion of what being pro-science implies. I just want to ask

When you say you are pro-science, what do you mean you are in favor of?

Below, I present a few different answers.

Science as a set of empirically validated statements

Science can mean facts such as:

  • the world is steadily warming and will continue to do as CO2 concentrations go up
  • nuclear power is safe
  • vaccines are safe
  • GMOs are safe

This is the idea behind the there is no alternative to facts rhetoric. The four statements above can be quibbled with (there are some risks to some vaccines, GMO refers to a technique not a product so even calling it safe or unsafe is not the right discussion, nuclear accidents have happened, and there is a lot of uncertainty on both the amount of warming and its downstream effects), but, when understood in general terms, they are facts and those who deny them, deny reality.

When people say that science is not political, they mean that these facts are independent of one’s values. I’d add the Theory of Evolution to the above four, but evolution (like Quantum Mechanics or Relativity) is even more undeniable.

Science and technology as a positive force

The above were “value-less facts”; let’s slowly get into values.

The facts above do not have any consequences for policy or behaviour on their own. They do constrain the set of possible outcomes, but for a decision, you need a set of values on top.

It’s still perfectly consistent with the facts and claim the following: Vaccines are safe, but a person’s bodily autonomy cannot be breached in the name of utilitarianism. In the case of children, the parents’ autonomy should be paramount. This is a perfectly intellectually consistent libertarian position. As long as you are willing to accept that children will die as a consequence, then I cannot really say you are denying the scientific evidence. This may seem a shocking trade-off when said out loud but it also happens to be the de facto policy of the Western world for the 10-20 past years: vaccines are recommended, but most jurisdictions will not enforce them anymore.

Similar statements can be made about all of the above:

  • The world is getting warmer, but fossil fuels bring human beings wealth and so, are worth the price to the natural environment. The rest should be dealt with mitigation and geo-engineering. What is important is finding the lowest cost solution for people.
  • Nuclear power is safe, but storing nuclear waste destroys pristine environments and that is a cost not worth paying.
  • GMOs are safe, but messing with Nature/God’s work is immoral.

Empirical facts can provide us with the set of alternatives that are possible, but do not help us weigh alternatives against each other (note how often cost/benefit shows up in the above, but the costs are not all material costs). Still, often being pro-science is understood as being pro technological progress and, thus, anti-GMO or anti-nuclear activism is anti-science.

Science as a community and set of practices

This meaning of “being pro-Science”, science as the community of scientists, is also what leads to views such as being pro-Science means being pro-inclusive Science. Or, on the other side, bringing up Dr. Mengele.

Although it is true that empirically validated facts are shared across humanity, there are areas of knowledge that impact certain people more than others. If there is no effort to uncover the mechanisms underlying a particular disease that affect people in poorer parts of the world, then the efforts of scientists will have a differential impact in the world.

Progress in war is fueled by science as much as progress in any other area and scientists have certainly played (and continue to play) important roles in figuring out ways of killing more people faster and cheaper.

The scientific enterprise is embedded in the societies around it and has, indeed, in the past resorted to using slaves or prisoners. Even in the modern enlightened world, the scientific community has had its share of unethical behaviours, in ways both big and small.

To drive home the point: does supporting science mean supporting animal experiments? Obviously, yes, if you mean supporting the scientific enterprise as it exists. And, obviously, no, if it means supporting empirically validated statements!

The cluster of values that scientists typically share

Scientists tend to share a particular set of values (at least openly). We are pro-progress (in technological and social sense), socially liberal, cosmopolitan, and egalitarian. This is the view behind science is international and people sharing photos of their foreign colleagues on social media.

There is nothing empirically grounded of why these values would be better than others, except that they seem to be statistically more abundant in the minds of professional scientists. Some of this may really be explained by the fact that open minded people will both like science and share this type of values, but a lot of it is more arbitrary. Some of it is selection: given the fact that the career mandates travel and the English language, there is little appeal to individuals who prefer a more rooted life. Some of it is socialization (spend enough time in a community where these values are shared and you’ll start to share them). Some of it is preference falsification (in response to PC, people are afraid to come out and say what they really believe).

In any case, we must recognition that there is no objective sense in which these values are better than the alternative. Note that I do share them. If anything, their arbitrariness is often salient to me because I am even more cosmopolitan than the average scientist, so I see how the barrier between the “healthy nationalism” that is accepted and the toxic variety is a pretty arbitrary line in the sand.

What is funny too is that science is often funded exactly for the opposite reasons: It’s a prestige project for countries to show themselves superior to others, like funding the arts, or the Olympics team. (This is not the only reason to fund science, but it is certainly one of the reasons). You also hear it in Science is what made America great.

Science as an interest group

Science can be an interest group like any other: we want more subsidies & lower taxes (although there is little room for improvement there: most R&D is already tax-exempt). We want to get rid of pesky regulation, and the right to self-regulate (even though there is little evidence that self-regulation works). Science is an interest group.

Being pro-science

All these views of “What do I mean when I am pro-science?” interact and blend into each other: a lot of the resistance to things like GMOs does come from an empirically wrong view of the world and correcting this view thus assuage concerns about GMOs. Similarly, if you accept that science generally results in good things, you will be more in favor of funding it.

Sometimes, though, they diverge. The libertarian view that mixes a strong empiricism and defense of empirically validated facts with an opposition to public funding of science is a minority overall, but over-represented in certain intellectual circles.

On the other hand, I have met many people who support science as a force for progress and as an interest group, but who end up defending all sorts of pseudo-scientific nonsense and rejecting the consensus on the safety of nuclear power or GMOs. This is why I work at a major science institution whose health insurance covers homeopathy: the non-scientific staff will say they are pro-science, but will cherish their homeopathic “remedies”. I also suspect that many people declare themselves as pro-science because they see it as their side versus the religious views they disagree with, even though you can perfectly well be religious and pro-science in accepting the scientific facts.  I would never claim that Amish people are pro-progress and I hazard no guess on their views on public-science funding, but many are happy to grow GMOs as they accept the empirical fact of their safety. In that sense, they are more pro-science than your typical Brooklyn hipster.

Sometimes, these meanings of being pro-science blend into each other by motivated reasoning. So, instead of saying that vaccines are so overwhelmingly safe and that herd immunity is so important that I support mandating them (my view), I can be tempted to say “there is zero risk from vaccines” (which is not true for every vaccine, but I sure wish it were). I can be tempted to downplay the uncertainty about the harder-to-disentangle areas of economic policy and cite the empirical studies that agree with my ideology, and to call those who disagree “anti-scientific.” I might deny that values even come into play at all. We like to pretend there are no trade-offs. This is why anti-GMO groups often start by discussing intellectual property and land-use issues and end up trying to shut down high-school science biology classes.

In an ideal world, we’d reserve the opprobrium of “being anti-science” for those who deny empirical facts and well-validated theories, while discussing all the other issues as part of the traditional political debates (is it worth investing public money in science or should we invest more in education and new public housing? or lowering taxes?). In the real world, we often borrow credibility from empiricism to support other values. The risk, however, is that, what we borrow, we often have to pay back with interest.

Haven’t They Suffered Enough?

Haven’t They Suffered Enough?

Every time I read about a plan to have more women and minority in science careers, I think of that famous New Yorker quip about gays getting marriedGays getting married? Haven’t they suffered enough?

Women in tenure-track positions? Haven’t they suffered enough?


I read this lament yesterday:

I was the lucky kid who never had to study for tests. I always scored in the 99% percentile on the annual state assessments.

[… Now I don’t make that much money.]

The national average at the time was that for every one faculty position, there were 200 applications. For our department, there were 300 applications for every one faculty position


Science will fail because the System is running the scientists out of it.

This is like nobody goes there anymore, it’s too crowded. In one sense it expresses a truth, but it is actually non-sensical.

The problem with science cannot simultaneously be that scientists are not sufficiently paid and that there are too many of them for the same position. And, if you argue that too many scientists are leaving academia, you also need to explain how this fits in with all the other complains about academia that focus on how hard it is to get a job.


If you want to make the argument that there should be more science funding, go ahead; I’ll support you 100%.

If you want to make the argument that postdoc salaries are so low that it’s hard to get a qualified candidate, go ahead; I’ll mostly disagree.

If you want to make the argument that the current system leads to sub-optimal science, go ahead, I might support or disagree depending on the details. In the comments to that article someone points out that in the current system PIs are incentivized to be overly conservative and focused on the short-term unlike the private sector which has a longer time-horizon (and perhaps more tolerance for failure). This sort of argument is much more interesting as it implies that there could be better mechanisms for funding.


But, reading these poor me laments, I actually conclude that the taxpayer is getting a great deal: it gets very smart people working 80 hour days for so little money that they cannot afford to go to the movies[1] and they even produce a lot of nice results. Man, your tax dollars are hard at work!

The goal of public science funding is to get as much science as possible. Scientists are a cost to the public to be minimized. It seems that this is working pretty well.

Can we structure the rest of the public sector to be like this? [2] We’d get excellent public services for much lower taxes (we could surely lower the Council Tax which seems to take such a big chunk of this poor fellow’s salary).

[1] I have to say I don’t fully believe that this guy has it this bad.
[2] Joking aside, I actually think that science funding is, in general, better than other types of funding at getting bang-for-public-buck. Tenure comes late in your career (and it is not enough to sit on your ass and not get fired for 2 years), the grant system is competitive, &c In spite of the fact that public funding dominates, very few people would argue that there is no competition in science.

Friday Links

1. On Schekman’s pledge to not publish high-profile. I almost called this a balanced view, but then realized that I probably used that phrase to refer to Derek Lowe’s work at least twice in the past. The man is smart and balanced, what can I say?

2. An interesting meeting report (closed access, sorry). Two highlights:

While discussing mutations that predispose to cancer, Nazneen Rahman (Institute of Cancer Research, UK) rightly reminded us that people make big decisions and have parts of their anatomy removed based on their genotype.


Jeanne Lawrence (University of Massachusetts Medical School, USA) convincingly showed that her lab was able to silence one entire copy of chromosome 21 in stem cells in vitro. Trisomy 21 or Down’s syndrome is caused by an extra copy of chromosome 21. […] Lawrence and colleagues inserted XIST (human X-inactivation gene) into chromosome 21 in stem cells with trisomy 21. They then showed using eight different methods that a single copy of the chromosome had indeed been silenced.

3. A good explanation of Bitcoin, the protocol

4. Interesting article about wine & technology in The Economist (which is one of the few mainstream magazines whose science coverage is worth reading [1]).

[1] Actually, I think it’s the only one who can be consistently trusted, but I enjoy anything by Ed Yong wherever he publishes and been reading some excellent articles by Carl Zimmer in The Atlantic.

Seeing is Believing. Which is Dangerous.

One of the nice things about being at EMBL is that, if you just wait, eventually you can hear the important people in your field speak. Today, I’m quite excited about the Seeing is Believing conference

But ever since I saw this advertised, I dislike the name Seeing is Believing.


  1. Seeing is believing. This is unquestionable.
  2. But seeing is not always justified believing. Our seeing apparatus will often lead us astray. This is especially true on images which do not look like the ones we evolved for (and grew up looking at).
  3. The fact that seeing is believing is actually often a cognitive problem which needs to be overcome!


I can no longer find who said it a BOSC, but someone pointed out, insightfully, that a visualization is already an interpretation of the data, it may be wrong.

More often than not, I show you a picture of a cell, this is rarely raw data. The raw data is a big pixel array. By the time I’m showing it to you I’ve done the following:

  1. Chosen an example to show.
  2. Often projected the data from 3D to a 2D representation
  3. Tweaked contrast.

Point 1 is the biggest culprit here: the selection of which cell to image and show can be an incredibly biased process (even unconsciously biased, of course).

However, even tweaks to the way that the projection is performed and to the contrast can highlight or hide important details (as someone with a lot of experience playing with images, I can tell you that there is a lot of space for “highlighting what you want to show”). In the newer methods (super-resolution type methods), this is even worse: the “picture” you see is already the output of a big processing pipeline.


I’m not even thinking about the effects of the tagging protocols, which introduces their own artifacts. But we, humans, often make the mistake of saying things like “this is an image of protein A in cell type B” instead of “this is an image of a chimeric protein which includes the sequence of A, with a strong promoter in cell type B”.


We know that these artifacts and biases are there, of course. But we believe the images. And this can be a problem because humans are not actually all that great at image analysis.

Seeing is believing, which too often means that we suspend our disbelief (or, as we scientists, like to say: we suspend our skepticism). This is not a recipe for good science.

Update: On twitter, Jim Procter (@foreveremain), points out a great example: the story of the salmon fMRI: we can see it, but we shouldn’t believe it.

Why Science is a Third World Economy

Because people are cheap and things are expensive.


To a large extent, it is easier to get money to pay for people (salaries [1]) than to pay for things. Other times, people show up who are willing to work without being paid (they are self-funded). But then you need to get them materials to work with. For that, you need to actually spend some money.  And sometimes you actually have money, but it can only pay for things of type X, but not of type Y, which is what you wanted.

So, it often feel very much like the third-world: a lot of people standing around a few physical resources, and replacement of capital by labour.


A while back I read a review which was comparing several technologies for the same measurement task [2]. There were two high-quality methods in terms of the output. One was very automated but required you buy some kit (~$400), the other was artisanal.

The authors wrote that the first one was good because it was very fast, but expensive. The other one took a long time, but was cheap. They didn’t even price in the cost of labour! They didn’t even ask how many hours of graduate student time you can get for $400.

Which, of course, makes some sense in the public-funded bureaucratic world where money is not fungible. You cannot often reallocate money from stipends to materials.


And then there is that expensive piece of equipment that is not really used because there was a specific half-a-million grant to buy it, but then enthusiasm petered out and the person who was going to use it had gotten a different job by the time the thing was delivered that nobody here really cared to pick it up.

Yep, that’s a third world thing too.

[1] or stipends which are exactly like a salary, except for tax purposes.
[2] I could probably find it now if I looked, but I don’t actually want to lose track of the main point.

Is Cell Segmentation Needed for Cell Analysis?

Having just spent some posts discussing a paper on nuclear segmentation (all tagged posts), let me ask the question:

Is cell segmentation needed? Is this a necessary step in an analysis pipeline dealing with fluorescent cell images?

This is a common FAQ whenever I give a talk on my work which does not use segmentation, for example, using local features for classification (see the video). It is a FAQ because, for many people, it seems obvious that the answer is that Yes, you need cell segmentation. So, when they see me skip that step, they ask: shouldn’t you have segmented the cell regions?

Here is my answer:

Remember Vapnik‘s dictum [1]do not solve, as an intermediate step, a harder problem than the problem you really need to solve.

Thus the question becomes: is your scientific problem dependent on cell segmentation? In the case, for example, of subcellular location determination, it is not: all the cells in the same field display the same phenotype, your goal being the find out what it is. Therefore, you do not need to have an answer for each cell, only for the whole field.

In other problems, you may need to have a per-cell answer: for example in some kinds of RNAi experiment only a fraction of the cells in a field display the RNAi phenotype and the others did not take up the RNAi. Therefore, segmentation may be necessary. Similarly, if a measurement such as distance of fluorescent bodies to cell membrane is meaningful, by itself (as opposed to being used as a feature for classification), then you need segmentation.

However, sometimes you can get away without segmentation.


An important point to note is the following: while it may be good to have access to perfect classification, imperfect classification (i.e., the type you actually get), may not help as much as the perfect kind.


Just to be sure, I was not the first person to notice that you do not need segmentation for subcellular location determination. I think this is the first reference:

Huang, Kai, and Robert F. Murphy. “Automated classification of subcellular patterns in multicell images without segmentation into single cells.” Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on. IEEE, 2004. [Google scholar link]

[1] I’m quoting from memory. It may a bit off. It sounds obvious when you put it this way, but it is still often not respected in practice.