1Thing #1

I started to write down in a little notebook some notes from podcasts that I listen to and decided to share them with you. Sometimes, it’s the main point of the episode, often it’s a side note that I found interesting. They will be tagged with 1thing.

#1/ A Chinese movie about Wu Zetian (the only female Empress of China, ruling 655-683 CE) featured historically-accurate costumes, but had to be edited as these were thought to be too revealing for modern standards. The History of China

Advertisements

New Paper: Microbial abundance, activity and population genomic profiling with mOTUs2

Fresh off the press in Nature Communications, an update on the mOTUs (marker gene-based operational taxonomic units) concept.

Basic summary of the motus concept

This concept was introduced in the first mOTUs paper, which itself built upon the specI concept:

  • Use single-copy marker genes to identify and quantify species in metagenomic samples. Single-copy marker genes are gene families such that (1) every1 organism has a gene in that family and (2) organisms only have one copy.
  • Use marker genes from both genomes and metagenomes to characterize species. Co-abundance can identify species in metagenomes even if there is no reference genome available for them.

What is new in motus2?

First of all, it updates the database with newer data, which is valuable for the tool, but this version includes a few conceptual improvements as well:

Fig 1b from the manuscript: Ref-mOTUs are mOTUs from which a reference genome is available, meta-mOTUs are those inferred only from metagenomics
  1. There is a better integration between the reference-derived and the metagenomic-identified mOTUs.
  2. The first version was only relevant for human gut samples, this version includes mOTUs for multiple environments (see Fig 1b above).
Fig 4a from the manuscript: how well do metatranscriptomic-derived species abundances correlate with metagenomics-ones? mOTUs does much better than kraken or metaphlan2 (which, to be fair, were never designed for this usage).
  1. We show that mOTUs work well with metatranscriptomics data to estimate species abundances. As these are housekeeping genes, their expression is constant enough that one can use mOTUs as a good proxy for species abundance (this was benchmarked by using samples from which both metagenomics and metatranscriptomics are available; see Fig 4a above).
  2. SNV profiles on the marker genes are a good proxy for SNV profiles on the whole genome, so that marker genes can be used for subspecies identification, whereas before we had used the whole genome.

You can get the tool from https://motu-tool.org/, through bioconda, and there is a module for running it through NGLess.


  1. Every is used in the biological, not logical sense; that is, it means >90% of the time that we know about it, but, there are exceptions; also we might not know anything of the 99% of organisms out there.

NIXML: nix + YAML for easy reproducible environments

The rise and fall of bioconda

A year ago, I remember a conversation which went basically like this:

Them: So, to distribute my package, what do you think I should use?

Me: You should use bioconda.

Them: OK, that’s interesting, but what about …?

Me: No, you should use bioconda.

Them: I will definitely look into it, but maybe it doesn’t fit my package and maybe I will just …

Me: No, you should use bioconda.

That was a year ago. Mind you, I knew there were some issues with conda, but it had a decent UX (user-experience), and more importantly, the community was growing.

Since then, conda has hit scalability concerns, which means that running it is increasingly frustrating: it is slow (an acknowledged issue, but I have had multiple instances of wait 20 minutes for an error message, which didn’t even help me solve the problem); mysterious errors are not uncommon, things that used to work now fail (I have had this more and more recently).

Thus, I no longer recommend bioconda so enthusiastically. What before seemed like some esoteric concerns about guaranteed correctness are now biting us.

The nix model

nix is a Linux distribution with a focus on declarativereproducible builds.

You write a little file (often called default.nix) which describes exactly what you want and the environment is generated from this, exactly the same each time. It has a lot going for it in terms of potential for science:

  1. Can be reproducible to a fault (Byte-for-Byte reproducibility, almost).
  2. Declarative means that the best practice of store your environment for later use is very easy to implement1

Unfortunately, the UX of nix is not great and making the environments reproducible, although possible is not so trivial (although it is now much easier). Nix is very powerful, but it uses a complicated domain-specific language and a semi-documented, ever evolving, set of build conventions which makes it hard for even experienced users to use it directly. There is no way that I can recommend it for general use.

The stack model

Stack is a tool for Haskell which uses the following concept for reproducible environments:

  1. The user specifies a list of packages that they want to use
  2. The user specifies a snapshot of the package directory.

The snapshot determines the versions of all of the packages, which automated testing has revealed to work together (at least up to the limits of unit testing). Furthermore, there is no need to say “version X.Y.Z of package A; version Q.R.S of package B,…”: you specify a single, globally encompassing version (note that this is one of the principles we adopted in NGLess, as we describe in the manuscript).

I really like this UX:

  • Want to update all your packages? just change this one number.
  • Didn’t work? just change it back: you are back where you started. This  is the big advantage of declarative approaches: what you did before does not matter, only the current state of the project.
  • Want to recreate an environment? just use this easy to read text file (for technical reasons, two files, but you get the drift).

Enter NIXML

https://github.com/luispedro/nixml

This is an afternoon hack, but the idea is to combine nix’s power with stack‘s UX by allowing you specify a set of packages in nix, using YaML

For example, start with this env.nlm file,

nixml: v0.0
snapshot: stable-19.03
packages:
  - lang: python
    version: 2
    modules:
      - numpy
      - scipy
      - matplotlib
      - mahotas
      - jupyter
      - scikitlearn
  - lang: nix
    modules:
      - vim

Now, running

nixml shell

returns a shell with the packages listed. Running

nixml shell –pure

returns a shell with only the packages listed, so you can be sure to not rely on external packages.

Internally, this just creates a nix file and runs it, but it adds the stack-like interface:

  1. it is always automatically pinned: you see the stable-19.03 thing? That means, the version of these packages that was available in the stable branch on March 2019.
  2. the syntax is simple, no need to know about python2.withPackages or any other nix internals like that. This means a loss of power for the user, but it will be a better trade-off 99% of the time.

Very much a Work in progress right now, but I am putting it out there as it is already usable for Python-based projects.


  1. There are two types of best practices advice: the type that, most people, once they try it out, adopt; and the type that you need to keep hammering into people’s heads. The second type should be seen as a failure of the tool: “best practices” are a user-experience smell.

Meta-ethics as an empirical question

The fundamental question of meta-ethics, namely what is the nature of ethical judgements? Are they, in some sense real? is an empirical question. Namely, if independently derived intelligences converge on a set of ethical statements, then this will be evidence that these are real.

A good expression of this view comes from Ian Bank’s Culture Series, namely The Hydrogen Sonata, what is called by the author, The Argument of Increasing Decency:

There was also the Argument of Increasing Decency, which basically held that cruelty was linked to stupidity and that the link between intelligence, imagination, empathy and good-behavior-as-it-was-generally-understood — i.e., not being cruel to others — was as profound as these matters ever got.

In fact, it is not this particular quote that best illustrates the argument, but Culture series as a whole as can be seen as a scify rendition of Fukuyama’ End of History1.

A not-so-uncommon knee-jerk reaction to the fear of super-intelligence AI is to quip “well, if the AI does get so intelligent, then it will not be so aggressive.” Frankly, I admire your dedication to realistic meta-ethics, but I am not sure we should bet our whole civilization on it.

I think most people are strong moral realists. Many who call themselves relativists turn out, with only modest probing, to be rigid realists, who don’t even understand the question and think that the only dimensions along which variation is reasonable are relatively shallow cultural practices. Very few would go as far as to say that “paper-clipping the universe is a goal just as valid as trying to achieve a more egalitarian society where multiple people flourish”.

If super-intelligent AIs, when they inevitably appear2, do share some moral intuitions with human scripture, then, I think we can say that meta-ethics is empirically solved and the realist side will have won. Otherwise, we’ll all be dead and the whole question will be a bit irrelevant.


  1. The author of the series would disagree, partially because Fukuyama includes a lightly regulated free market as part of his End of History, while the Culture Series attempts a more Marxist view of history, with communism being the pinacle of civilization. However, not only should we not trust authors too much when they discuss their own work (given that they are so likely to be biased), but implicity, the series agrees with E. O. Wilson’s comment about Communism: Great system, wrong species (he meant that communism is great for ants, not for humans), as the Culture is run by powerful super-computers (Minds) and humans are, basically, pets (see this quote from Surface Detail: “Though drones, avatars and even humans are one thing; the loss of any is not without moral and diplomatic import, of course, but might be dismissed as merely unfortunate and regrettable, something to be smoothed over through the usual channels. Attacking a ship, on the other hand, is an unambiguous act of war.”). The books are also at their most insightful when they argue forcefully how the West (the Culture) will tremble a bit if faced with some violent religious fanatics, but it is actually more militaristic and less decadent than these religious lunatics believe (and that it itself thinks), so that, in the end, the Islamic State (represented by the Iridians) don’t really stand a chance.
  2. I am not predicting that this will happen any time soon. But I don’t see why it shouldn’t happen in the next few centuries, comparing our knowledge and technological abilities today with those of a millenium agao, it seems more reasonable to posit super-human AI by the year 3000 than to deny its possibility.

Thoughts on “Revisiting authorship, and JOSS software publications”

This is a direct response to Titus’ post: Revisiting authorship, and JOSS software publications. In fact, I started writing it as a comment there and it became so long that I decided it was better as its own post.

I appreciate the tone of Titus’ post, more asking questions than answering them, so here are my two or three cent:

There is nothing special about software papers. Different fields have different criteria for what consistutes authorship. I believe that particle physics has an approach close to “everyone that committed to the git repo gets to be an author”, which leads to papers with >1000 authors (note that I am not myself a particle physicist or anything close to it). At that point, the currency of authorship and citations is dilluted to the point where I seriously don’t know what the point is. What is maybe special about software papers is that, because they are newer, there hasn’t been time for a rough-consensus to emerge on what should be the criteria (I guess this is done in part in discussions like these). Having said that, even in well-established fields, you still have issues that are never resolved (in molecular biology: the question of whether technicians should be listed as authors brings in strong opinions on both sides).

My position is against every positive contribution deserves authorship. Some positive contributions are significant enough that they deserve authorship, others acknowledgements, others can even go unmentioned. Yes, it’s a judgement call what is “significant” and what is not, but I think someone who, for example, reports a bug that leads to a bugfix is a clear NO (even if it’s a good bug report). Even small code contributions should not lead to authorship (perhaps an acknowledgement is where my indecision boundary is at that point). People who proof read a manuscript also don’t get authorship even if they do find a few typos or suggest a few wording changes.

Of course, contribution need not be code. Tutorials, design, &c, all count. But they should be significant. I also would not consider adding as an author an individual that asked a good question during a seminar on the work. Those sometimes turn out to be like good bug reports in that you improve the work based on them. The fact that significance is a judgement call does not imply that we should drop significance as a criterion.

I think authorship is also about responsibility. If you are an author, then you must take responsibility for some part of the work (naturally, not all of it, but some of it). If there are issues later, it is your responsibility to, at the very least, explain what you did or even to fix it, &c. You should be involved in the paper writing and if, for example, some work is need for revision on that particular aspect of the code, you need to do it.

From my side, I have submitted several patches to projects which were best-efforts at the time, but I don’t want to take any responsibility for the project beyond that. If the author of one of those projects now told me that I needed to redo that to work on Mac OS X because one of the reviewers complained about it, I’d tell them “sorry, I cannot help you”. I don’t think authors should get to do that.

I would get off-topic here, but I also wish there was more of an explicit expectation that if you publish a software paper, you shall provide minimal maintenance for 5-10 years. Too often software papers are the obituary of the project rather than the announcement.

My answer to “Another question: does authorship keep accruing over versions? Should all the authors on sourmash 2.0 be authors on sourmash 3.0?” is a strong NO. You don’t double dip. If anything, I think it’s generally the authors of version 2 that often lose out as people benefit from their work and keep citing version 1.

Finally, a question of my own: what if someone does something outside the project that clearly benefits the project, should they be authors? For example, what if someone does a bioconda package for your code? Or writes an excellent blogpost/tutorial that brings you a very large number of users (or runs a tutorial on it)? Should they be authors? My answer is that it first needs to be significant, and it should not be automatic. It may be appropriate to invite them for authorship, but they should then commit to (at the very least) reading the manuscript and keeping up with the development of the project over the medium-term.

Vaccinate even if it probably won’t make a different to you personally

From the ongoing series “how do statistics feel to me” (previous episode)

vaccines

Vaccines create adults

This is a strong slogan, but it’s trivially false and, eventually, stupid. There were plenty of adults before vaccines. Almost every kid that gets one of the diseases that we vaccinate against will survive just fine. Many other pro-vaccination slogans are equally alarmist, equally false, and (I believe) equally counter-productive.

Mind you, I agree with the idea that vaccines are good and I think they should be mandatory, due to the fact that children cannot make informed decisions and vaccination has public health consequences (it’s not just your body, your choice, it’s our herd immunity, our choice). But, even if all vaccines were to disappear, most of the time, the kids would be fine. In fact, the total mortality rate might be not more than 1-2% (deaths before the age of 5).

Without vaccines, most families would not have to explain to the older brother why their sister died, they would not have to mourn a dead child. Society would not materially change that much. There would be roughly as many adults around.

However, 1-2% of children dying would be horrible. There is no need to exaggerate to make it sound even worse: it would be bad enough. If you take your kids to kindergarten, and your kindergarten has an average size, that would be one child funeral per year (or every other year).

Without vaccines, your child would most likely be fine, the odds are in their favour (they’d have a 98% chance of making it). However, you would probably have to explain to them that someone in their school of a horrible disease at some point in their early childhood. You’d probably be saying “Good Morning” to at least one parent who lost a child on a regular basis. If you are a early-education teacher, you’d have to expect to attend at least half-a-dozen funerals in your career, mourning little children.

This is a tremendous amount of pain and grief.

Most kids would be fine, most kids were fine before vaccines. But 1-2% is way too much. We’d see people developing defense mechanisms (Detachment parenting: don’t get too close before you know they’ll make it, a New York Times best seller for the post-vaccine era). We don’t need vaccines to create adults, but a world without them is a significantly worse world than the one we live in today.

It’s not “if we don’t vaccinate, all children will die”: that’s flashy, but false.

Rather, it’s “when you walk into kindergarten with your toddler tomorrow, think that, without vaccines, one child in there would die every other year. It’s probably not going to be yours, but, please, vaccinate them.”

I am a regional thinker: a review of “stubborn attachments”

Alternative title: Why I moved to China

Tyler Cowen likes to say that everyone is a regional thinker. It took me a while to understand how that applied to me. I have a split identity (and two passports to go with them). I am both Portuguese and British, with a German high-school education (in Lisbon). So, it’s not even clear which region I should be grouped with.

But Tyler’s statement (as I understand it) is not everyone can be classified as belonging to a regional school of thought, but rather your thinking, even in the most abstract of subjects will have been molded by where you spent your formative years.

I grew up in Portugal, coming of age in the 1990s. The narrative I learned at the time goes as follows: Portugal was held back by the dictatorship, which turned its back on development and the world, Salazar, dictator from 1926-1968, famously said “Portugal will be proudly alone”. Fortunately for the country (unfortunately for the regime), people were still learning about the outside world and eventually got fed up: after the revolution in 1974, the country turned to Europe, joined the EU, and started to catch up to the rest of the West.

Now, having been born in a democratic Portugal, we were the first generation to grow up in a modern Portugal and we’d have lives that were European and turned to the future, not the past. Everyone complained about all the construction that was taking place as the city of Lisbon was being transformed into a modern city, but it was also a sign of progress. The high point of this period was the 1998 World Expo in Lisbon, which included a large expansion of the subway network, and a shiny new bridge across the Tagus, at the time the longest in Europe.

I could now point to where this narrative was a bit too simplistic (in particular, there was a lot of economic growth in the 1950s and 60s, and the true victims of the dictatorship were in the African colonies), but the point is that this is how we thought of the situation: Portugal had been held back by an accident of history and we could see Spain across the border being slightly richer as an example of where the country would be a few years later and the core of the EU as where we could expect to be within our lifetimes.

In 1998, the World Expo in Lisbon was a huge success (after a few initial hiccups), a whole new modern-looking neighbourhood was built on what had been industrial lands. Now UN Secretary-General Guterres was prime-minister, one of the darlings of the international “Third Wave Socialists” movement. Portugal had gotten rid of the nasty right-wing in 1974, now it also had a modern Left.

Then came nothing. Economic growth stumbled. Guterres resigned a few years later (curiously, for someone who was bringing a modern social-democracy, Guterres was, and probably still is, a social-conservative). Outside of Lisbon, things were still improving as the other cities caught up to the capital, but eventually that petered out. Portugal has now had two lost decades. Adjusting for inflation, GDP per capita grew 7% between 2000 and 2008. I mean it grew 7% over that whole period, not on a yearly basis. Then it fell during the crisis and only last year did it get back to 2008 levels, so that between 2000 and 2017, total growth was 7%. Nobody believes that today’s 20-year-old kids will have an European lifestyle (and I don’t even mean a Nordic lifestyle, just a France/German lifestyle).

A few months ago, Noah Smith tweeted that “people compare themselves to other in their society, so saying that ‘things are getting better’ doesn’t help. Nobody compares themselves to people in 2318”. When I read that, I thought, “Why not? I might not look 300 years into the future, but I certainly compare our world to the world in 2038 and think we’re failing”. (Noah might have tweeted a better version, I couldn’t find the original tweet).

The idea of a Great Stagnation has always been deeply intuitive to me and I frankly cannot understand people who say that technology is moving too fast. I grew up seeing real change around me, see it suddenly stop, and feeling short changed by Portugal. Eventually, I left. I didn’t leave because growth stopped, but I stayed away because growth stopped.

I see a lot of superficial changes all the time, but that’s like fashion: now we wear tight jeans, we used to wear bell-bottoms. It’s change, it may even be a good thing, in that it is fun, break the monotony, change” but not progress.

To be sure, stagnation may not be so bad. Germany is the stagnant country par excellence and it’s a nice place to live. It is certainly one of the best countries in the world in terms of quality of life. But, Germany is stagnant and once you see it, the gap between what could easily be and what actually is gets too large to be ignored. As time goes on, the gap will only get larger. I guess if you grow up without large changes for decades, you start to expect stagnation, maybe even enjoy it. You compare yourself to the Jones’ next door and not to 2038 because there is no picture in your minds’ eye of what 2038 should look like and how it should be better.

The population who lived in Portugal through the last 10 years now get excited over 2.2% year-on-year growth. After so many years of nothing, mediocre growth feels amazing. Still, if you cross the border into Spain it no longer feels “this is what Portugal will be in 2021”. Compared to Portugal, Spain now feels like a much wealthier, qualitatively different, better economy. Portugal could have been that, but, at least in my lifetime, it probably won’t be. This is a lost opportunity and it brings me sadness.

Maybe it’s not that I am a regional thinker, but a regional feeler. I have a visceral feel for what it means to “grow to the level of Greece and then stop there” that comes from lived experience.

In summary, this is why I recommend you read Stubborn Attachments.