What surprised me in 2016

2016 made me reassess an important component of my view of the world. No, not Brexit or Trump becoming President (although, it’s not unrelated).

At the end of 2016, I realized that almost all psychology is pseudo-science. Not hyperbole, not oversold, but pseudo-science.

People used to joke that Parapsychology is the control group for science: i.e., a bunch of people ostentatiously following the scientific method in a situation where every result should come out negative. It’s a null field: the null hypothesis (that there is no effect) is true. Thus, the fact that you can still get positive effects should be worrisome. Turns out the true joke was that psychology is the true control group. Parapsychology was a bad control as most scientists were already predisposed to disbelieve them. Psychology is a much better control.

I had heard of the “Replication Crisis” before, but had not delved into the details. I thought psychology was like microbiome studies: over-hyped but, fundamentally, correct. We may see reports that the microbiome makes you be rude to your uber driver or whatever silly effect. We often read about the effects of the microbiome on obesity, as if it didn’t matter that our diets are not as healthy as they should be and it was all down to microbes. Jonathan Eisen collects these as overselling the microbiome.

Still, to say that people oversell the microbiome is not to say that there is no effect. The microbes do not single-handedly cause obesity, but they have an impact on the margin (a few BMI points up or down), which is enough to be significant for the population. They may not cause nor cure cancer, but they seem to influence the effect of immunotherapy enough that we may need to adjust dosages/drug combinations. And so on…

I thought that when it came to psychology, the same was true: sure, a lot of hype, but I thought there was a there there. There isn’t.

My basic mistake was that I had shared Daniel Kahneman’s view of the situation:

My position […] was that if a large body of evidence published in reputable journals supports an initially implausible conclusion, then scientific norms require us to believe that conclusion. Implausibility is not sufficient to justify disbelief, and belief in well-supported scientific conclusions is not optional. This position still seems reasonable to me – it is why I think people should believe in climate change.

This was exactly my position until I read this long Andrew Gelman post. Since then, I started to read up on this and find that psychology (as a field) has sold us a bill of goods.


No, computers are not setting us up for disaster

Yesterday, the Guardian published a long essay by Tim Harford on the dangers of automation. The argument is not new (I first heard it on the econtalk episode with David Mindell), and the characteristic example is that of the Air France flight that crashed in the middle of the ocean after the autopilot handed control back to the human pilots who immediately proceeded to crash the plane. As I read it the argument runs as follows: (a) full automation is impossible, (b) partial automation erodes skills, therefore (c) we should be wary of over-automating.

On twitter, I responded with the snark that that medium encourages:

But I feel I should make a longer counter-argument.

1. Despite being a good visual (a plane crash is dramatic), the example of an airplane crash in 2009 is a terrible one. Commercial civil aviation is incredibly safe. Commercial aviation is so safe, I wouldn’t be surprised to read a contrarian Marginal Revolution post arguing it’s now too safe and we need more dangerous planes. I would be very careful in arguing that somehow whatever the aviation community does, is not working based on a single incident that happened 7 years ago. If this was happening every 7 weeks, then it would be a very worrying problem, but it doesn’t.

2. Everything I’ve heard and read about that Air France accident seems to agree that the pilots were deeply incompetent. I have also gotten the distinct impression that if the system had not handed back control to the humans, they would not have crashed the plane. It is simply asserted that we cannot have completely autonomous planes, but without evidence. Perhaps at the very least, it should be harder for the humans to override the automated control. Fully automated planes would also not be hijackable in a 9/11 way nor by their own pilots committing suicide (which given how safe planes are, may now be a significant fraction of airplane deaths!).

3. Even granting the premise of the article, that (a) full automation is impossible and (b) partial automation can lead to skill erosion, the conclusion that “the database and the algorithm, like the autopilot, should be there to support human decision-making” is a non sequitor. It assumes that the human is always a better decision maker, which is completely unproven. In fact, I rather feel that the conclusion is the opposite: the pilot should be there (if a pilot is needed, but let’s grant that) to support the autopilot. Now, we should ask: what’s the best way for pilots to support automated systems? If it is to intervene in times of rare crisis, then pilots should perhaps train like other professionals who are there for crises: a lot of simulations and war games for the cases that we hope never happen. Perhaps, we’ll get to a world where success is measured by having pilots spend their whole careers without ever flying a plane, much like a Secret Service agent trains for the worst, but hopes to never have to throw themselves in front of a bullet.

4. Throughout the essay, it is taken as a given that humans are better and computers are there to save on effort. There is another example, that of meteorologists who now trust the computer instead of being able to intuit when the computer has screwed up, which is what used to happen, but I don’t see an argument that their intuition is better than the computer. If you tell me that the veteran meteorologists can beat the modern systems, I’ll buy that, but I would also think that maybe it’s because the veteran meteorologists were working when the automated systems weren’t as good as the modern ones.

5. The essay as a whole needs to be more quantitative. Even if computers do cause different types of accident, we need to have at least an estimate of whether the number of deaths is larger or smaller than using other systems (humans). I understand that authors do not always choose their titles, but I wouldn’t have responded if title of the essay had been “It won’t be perfect: how automated systems will still have accidents”.

6. The skill erosion effect is interesting per se and there is some value in discussing it and being aware of it. However, I see no evidence that it completely erases the gains from automation (rather than being a small “tax” or clawback on the benefits of automation) and that the solution involves less automation rather than either more automation or a different kind of human training.

7. My horse riding skills are awful.

At the Olympics, the US is underwhelming, Russia still overperforms, and what’s wrong with Southern Europe (except Italy)?

Russia is doing very well. The US and China, for all their dominance of the raw medal tables are actually doing just as well as you’d expect.

Portugal, Spain, and Greece should all be upset at themselves, while the fourth little piggy, Italy, is doing quite alright.

What determines medal counts?

I decided to play a data game with Olympic Gold medals and ask not just “Which countries get the most medals?” but a couple of more interesting questions.

My first guess of what determines medal counts was total GDP. After all, large countries should get more medals, but economic development should also matter. Populous African countries do not get that many medals after all and small rich EU states still do.

Indeed, GDP (at market value), does correlate quite well with the weighted medal count (an artificial index where gold counts 5 points, silver 3, and bronze just 1)

Much of the fit is driven by the two left-most outliers: US and China, but the fit explains 64% of the variance, while population explains none.

Adding a few more predictors, we can try to improve, but we don’t actually do that much better. I expect that as the Games progress, we’ll see the model fits become tighter as the sample size (number of medals) increases. In fact, the model is already performing better today than it was yesterday.

Who is over/under performing?

The US and China are right on the fit above. While they have more medals than anybody else, it’s not surprising. Big and rich countries get more medals.

The more interesting question is: which are the countries that are getting more medals than their GDP would account for?

Top 10 over performers

These are the 10 countries which have a bigger ratio of actual total medals to their predicted number of medals:

                delta  got  predicted     ratio
Russia       6.952551   10   3.047449  3.281433
Italy        5.407997    9   3.592003  2.505566
Australia    3.849574    7   3.150426  2.221921
Thailand     1.762069    4   2.237931  1.787366
Japan        4.071770   10   5.928230  1.686844
South Korea  1.750025    5   3.249975  1.538473
Hungary      1.021350    3   1.978650  1.516185
Kazakhstan   0.953454    3   2.046546  1.465884
Canada       0.538501    4   3.461499  1.155569
Uzbekistan   0.043668    2   1.956332  1.022322

Now, neither the US nor China are anywhere to be seen. Russia’s performance validates their state-funded sports program: the model predicts they’d get around 3 medals, they’ve gotten 10.

Italy is similarly doing very well, which surprised me a bit. As you’ll see, all the other little piggies perform poorly.

Australia is less surprising: they’re a small country which is very much into sports.

After that, no country seems to get more than twice as many medals as their GDP would predict, although I’ll note how Japan/Thailand/South Kore form a little Eastern Asia cluster of overperformance.

Top 10 under performers

This brings up the reverse question: who is underperforming? Southern Europe, it seems: Spain, Portugal, and Greece are all there with 1 medal against predictions of 9, 6, and 6.

France is country which is missing the most medals (12 predicted vs 3 obtained)! Sometimes France does behave like a Southern European country after all.

                delta  got  predicted     ratio
Spain       -8.268615    1   9.268615  0.107891
Poland      -6.157081    1   7.157081  0.139722
Portugal    -5.353673    1   6.353673  0.157389
Greece      -5.342835    1   6.342835  0.157658
Georgia     -4.814463    1   5.814463  0.171985
France      -9.816560    3  12.816560  0.234072
Uzbekistan  -3.933072    2   5.933072  0.337093
Denmark     -3.566784    3   6.566784  0.456845
Philippines -3.557424    3   6.557424  0.457497
Azerbaijan  -2.857668    3   5.857668  0.512149
The Caucasus (Georgia, Uzbekistan, Azerbaijan) may show up as their wealth is mostly due to natural resources and not development per se (oil and natural gas do not win medals, while human capital development does).
I expect that these lists will change as the Games go on as maybe Spain is just not as good at the events that come early in the schedule. Expect an updated post in a week.
Technical details

The whole analysis was done as a Jupyter notebook, available on github. You can use mybinder to explore the data. There, you will even find several little widgets to play around.

Data for medal counts comes from the medalbot.com API, while GDP/population data comes from the World Bank through the wbdata package.

I cannot find a decent offline email client

I spend a lot of time on trains and airplanes without internet connection and keep struggling with the fact that it has become significantly harder to use email without an internet connection. I would like to be able to write a few emails, which would be sent out when I reached civilization and had enough wireless coverage to tether the computer to my phone. Ten years ago, this would have been easy; today, not so much.

(Of course, if there was decent wireless data coverage everywhere, this would not be an issue [1]; except perhaps for data charges. However, as the world stands today, I can get neither a good data connection nor decent software to handle this.)


The first email clients were on networked machines are were online only. When dialup came along, they became desktop applications: your mail would be stored on your machine and minimizing the amount of time spent online was often a goal: you would write your messages, save in the Outbox, and then go online to get new messages and send your stored messages. As things moved to the cloud (especially with the appearance of Hotmail and the concept of webmail), desktop email started losing users (it probably never had many customers) and is slowly dying.

I would happily pay for a decent desktop offline-capable email client, but cannot find one. I tried Postbox, but it’s just a reheated version of thunderbird. Thunderbird itself is no longer being developed (as Mozilla is focusing on other priorities [2]). I used to use Kmail, which was not great, but worked in KDE3; completely stopped working with the advent of KDE4. Opera Mail is also no longer begin developed.

I could try to use mutt or another command line client, but the command line has not adapted to rich displays [3] and HTML is a nuisance in those [4]. I also enjoy drag&dropping attachments.


It is interesting to see how even with software, functionality can be lost if not continuously maintained.

[1] Perhaps part of the issue is that the US does have better wireless data coverage than Germany does. Since most technology is developed for that market…
[2] I also suspect the technology hit a local maximum and could not easily be improved and the many bugs encountered are very hard to fix.
[3] There is no technical reason to not have colour and HTML on the command line, but it’s hard to move these very old technology and the track record is that they’ll be with us for a few more decades.
[4] I still remember when it was considered rude by some to send HTML email. I too had a phase like that.

Self-driving cars kill, but they can improve; humans just kill

A Tesla self-driving car killed its “driver”:

According to Tesla’s account of the crash, the car’s sensor system, against a bright spring sky, failed to distinguish a large white 18-wheel truck and trailer crossing the highway.

Pretty awful failure of compute vision.


I have had more than one person tell me that while they agree with me that self-driving cars can be safer, they will never be allowed by the regulators because the politics are not working in their favor. While I understand some of the concerns, I don’t understand how you can believe such a thing and not turn into a full-blown libertarian.

If you believe that government regulations cause the unnecessary death of circa 92 people a day in the US (many more globally) how can you not become a libertarian raging against government intervention?

I actually have a better opinion of both regulators (who can delay life saving products, sure, but will probably not stop such a technology all together) and lobbyists (lobbyist can and do influence regulations).


There is a point to this post though: I’m pretty confident that pretty soon (in a matter of days, perhaps), Tesla will roll out an update that will make sure that white truck trailers against a bright sun are properly detected. So, this particular type of accident will be less likely to ever happen again. Self-driving car accidents expose flaws in the system which can be fixed.

Oh, but there are many other types of accidents, I hear you say. Of course, thousands of them, but if every time one of them happens, that particular type of accident gets fixed, the death rate will keep going down.

Compare with how “we” improve human drivers: educational campaigns and fines for every bad behaviour we uncover (and how many people still text while driving). Sure, it has an effect, but it’s not half as efficient as a patch.

Self-driving cars will have accidents, but unlike human drivers, they can improve rapidly. Until eventually, driving will be as safe as flying.

Fast and useful errors with ngless [3/5]

NOTE: As of Feb 2016, ngless is available only as a pre-release to allow for testing of early versions by fellow scientists and to discusss ideas. We do not consider it /released/ and do not recommend use in production as some functionality is still in testing. Please if you are interested in using ngless in your projects.

This is the first of a series of five posts introducing ngless.

  1. Introduction to ngless
  2. Perfect reproducibility using ngless
  3. Fast and high quality error detection [this post]
  4. Extending and interacting with other projects
  5. Miscellaneous

If you are the rare person who just writes code without bugs (if your favorite editor is cat), then you can skip this post as it only concerns those of us who make mistakes. Otherwise, I will assume that /your code will have bugs/. Your code will have silly typos and large mistakes.

Too many tools work well, but fail badly; that is, if all their dependencies are there, all the files are exactly perfect and the user specificies all the right options, then the tool will work perfectly; but any mistake and you will get a bizarre error, which will be hard to fix. Thus,the tool is bad at failing. Ngless promises to work well and fail well.

Make some errors impossible

Let us recall our running example:

ngless "0.0"
import OceanMicrobiomeReferenceGeneCatalog version "1.0"

input = paired('data/data.1.fq.gz', 'data/data.2.fq.gz')
preprocess(input) using |read|:
    read = substrim(read, min_quality=30)
    if len(read) < 45:

mapped = map(input, reference='omrgc')
summary = count(mapped, features=['ko', 'cog'])
write(summary, ofile='output/functional.summary.txt')

Note that we do not specify paths for the ‘omrgc’ reference or the functional map file. We also do not specify files for intermediate files. This is all implicit and you cannot mess it up. The Fastq encoding is auto-detected, removing one more opportunity for you to mess up (although you can specify the encoding if you really want to).

Ngless always uses the three step output safe writing pattern:

  1. write the output to a temp file,
  2. sync the file and its directory to disk,
  3. rename the temp file to the final output name.

The final step is atomic. That is, the operating system garantees that it either fully completes or never executes even if there is an error, so that you never get a partial file. Thus, if there is an output file, you know that ngless finished without errors (up to that point, at least) and that the output is correct. No more asking “did the cluster crash affect this file? Maybe I need to recompute or maybe I count the number of lines to make sure it’s complete”. None of that: if the file is there, it is complete.

Side-note: programming languages (or their standard libraries) should have support for this safe-output writing pattern. I do not know of any language that does.

Make error detection fast

Have you ever run a multi-step pipeline where the very last step (often saving the results) has a silly typo and everything fails disastrously at that point wasting you hours of compute time? I know I have. Ngless tries as hard as possible to make sure that doesn’t happen.

Although ngless is interpreted rather than compiled, it performs an initial pass over your script to check for all sorts of possible errors.

Ngless is a typed language and all types are checked so that if you try to run the count function without first maping, you will get an error message.

All arguments to functions are also checked. This even checks some rules that would be hard to impose using a more general purpose programming language: for example, when you call count, either (1) you are using a built-in reference which has its own annotation files or (2) you have to pass in the path to a GTF or gene map file so that the output of the mapping can be annotated and summarized. This constraint would be hard to express in, for example, Java or C++, but ngless can check this type of condition easily.

The initial check makes sure that all necessary input files exist and can be read and even that any directories used for output are present and can be written to (in the script above, if a directory named output is not present, you will get an immediate error). If you are using your own functional map, it will read the file header to check that any features you use are indeed present (in the example above, it checks that the ‘ko’ and ‘cog’ features exist in the built-in ocean microbiome catalog).

All typos and other similar errors are caught immediately. If you mistype the name of your output directory, ngless will let you know in 0.2 seconds rather than after hours of computation.

You can also just run the tests with ngless -n script-name.ngl: it does nothing except run all the validation steps.

Again, this is an idea that could be interesting to explore in the context of general purpose languages.

Make error messages helpful

An unknown error occurred
An unhelpful error message

As much as possible, when an error is detected, the message should help you make sense of it and fix it. A tool cannot always read your mind, but as much as possible, ngless error messages are descriptive.

For example, if you used an illegal argument/parameter to a function, ngless will remind you of what the legal arguments are. If it cannot write to an output file it will say it cannot write to an output file (and not just “IO Error”). If a file is missing, it will tell you which file (and it will tell you in about 0.2 seconds.


Ngless is designed to make some errors impossible, while trying hard to give you good error messages for the errors that will inevitably crop up.