I tried Haskell for 5 years and here’s how it was

One blogpost style which I find almost completely useless is “I tried Programming Language X for 5 days and here’s how it was.” Most of the time, the first impression is superficial discussing syntax and whether you could get Hello World to run.

This blogpost is I tried Haskell for 5 years and here’s how it was.

In the last few years, I have been (with others) developing ngless, a domain specific language and interpreter for next-generation sequencing. For partly accidental reasons, the interpreter is written in Haskell. Even though I kept using other languages (most Python and C++), I have now used Haskell quite extensively for a serious, medium-sized project (11,270 lines of code). Here are some scattered notes on Haskell:

There is a learning curve

Haskell is a different type of language. It takes a while to fully get used to it if you’re coming from a more traditional background.

I have debugged code in Java, even though I never really learned (or wrote) any Java. Java is just a C++ pidgin language.

The same is not true of Haskell. If you have never looked at Haskell code, you may have difficulty following even simple functions.

Once you learn it, though, you get it.

Haskell has some very nice libraries

You really have very nice libraries, written by people doing really useful things.

Conduit and Parsec are the basis of a lot of ngless code.

Here is an excellent curated list of Haskell library world (added May 4)

Haskell libraries are sometimes hard to figure out

I like to think that you need both hard documentation and soft documentation.

Hard documentation is where you describe every argument to a function and its effects. It is like a reference work (think of man pages). Soft documentation are tutorials and examples and more descriptive text. Well documented software and libraries will have both (there no need for anything in between, I don’t want soft serve documentation).

Haskell libraries often have extremely hard documentation: they will explain the details of functions, but little in the way of soft documentation. This makes it very hard to understand why a function could be useful in the first place and in which contexts to use this library.

This is exacerbated by the often extremely abstract nature of some of the libraries. Case in point, is the very useful MonadBaseControl class. Trust me, this is useful. However, because it is so generic, it is hard to immediately grasp what it does.

I do not wish to over-generalized. Conduit, mentioned above, has tutorials, blogposts, as well as hard documentation.

Haskell sometimes feels like C++

Like C++, Haskell is (in part) a research project with a single initial Big Idea and a few smaller ones. In Haskell’s case, the Big Idea was purely functional lazy evaluation (or, if you want to be pedantic, call it “non-strict” instead of lazy). In C++’s case, the Big Idea was high level object orientation without loss of performance compared to C.

Both C++ and Haskell are happy to incorporate academic suggestions into real-world computer languages. This doesn’t need elaboration in the case of Haskell, but C++ has also been happy to be at the cutting edge. For example, 20 years ago, you could already use C++ templates to perform (limited) programming with dependent types. C++ really pioneered the mechanism of generics and templates.

Like C++, Haskell is a huge language, where there are many ways to do something. You have multiple ways to represent strings, you have accidents of history kept for backwards compatibility. If you read an article from 10 years ago about the best way to do something in the language, that article is probably outdated by two generations.

Like C++, Haskell’s error messages take a while to get used to.

Like C++, there is a tension in the community between the purists and the practitioners.

Performance is hard to figure out

Haskell and GHC generally let me get good performance, but it is not always trivial to figure out a priori which code will run faster and in less memory.

In some trivial sense, you always depend on the compiler to make your code faster (i.e., if the compiler was infinitely smart, any two programs that produce the same result would compile to the same highly efficient code).

In practice, of course, compilers are not infinitely smart and so there faster and slower code. Still, in many languages you can look at two pieces of code and reasonably guess which one will be faster, at least within an order of magnitude.

Not so with Haskell. Even very smart people struggle with very simple examples. This is because the most generic implementation of the code tends to be very inefficient. However, GHC can be very smart and make your software very fast. This works 90% of the time, but sometimes you write code that does not trigger all the right optimizations and your function suddenly becomes 1,000x slower. I have once or twice written two almost identical versions of a function with large differences in performance (orders of magnitude).

This leads to the funny situation that Haskell is (partially correctly) seen as an academic language used by purists obsessed with elegance; while in practice, a lot of effort goes into making the code written as compiler-friendly as possible.

For the most part, though, this is not a big issue. Most of the code will run just fine and you optimize the inner loops at the end (just like in any other language), but it’s a pitfall to watch out for.

The easy is hard, the hard is easy

For minor tasks (converting between two file formats, for example), I will not use Haskell; I’ll do it Python: It has a better REPL environment, no need to set up a cabal file, it is easier to express simple loops, &c. The easy things are often a bit harder to do in Haskell.

However, in Haskell, it is trivial to add some multithreading capability to a piece of code with complete assurance of correctness. The line that if it compiles, it’s probably correct is often true.

Stack changed the game

Before stack came on the game, it was painful to make sure you had all the right libraries installed in a compatible way. Since stack was released, working in Haskell really has become much nicer. Tooling matters.

The really big missing piece is the equivalent of ccache for Haskell.


Haskell is a great programming language. It requires some effort at the beginning, but you get to learn a very different way of thinking about your problems. At the same time, the ecosystem matured significantly (hopefully signalling a trend) and the language can be great to work with.

When you say you are pro-science, what do you mean you are in favor of?

In the last few weeks, with the March for Science coming up, there have been a few discussion of what being pro-science implies. I just want to ask

When you say you are pro-science, what do you mean you are in favor of?

Below, I present a few different answers.

Science as a set of empirically validated statements

Science can mean facts such as:

  • the world is steadily warming and will continue to do as CO2 concentrations go up
  • nuclear power is safe
  • vaccines are safe
  • GMOs are safe

This is the idea behind the there is no alternative to facts rhetoric. The four statements above can be quibbled with (there are some risks to some vaccines, GMO refers to a technique not a product so even calling it safe or unsafe is not the right discussion, nuclear accidents have happened, and there is a lot of uncertainty on both the amount of warming and its downstream effects), but, when understood in general terms, they are facts and those who deny them, deny reality.

When people say that science is not political, they mean that these facts are independent of one’s values. I’d add the Theory of Evolution to the above four, but evolution (like Quantum Mechanics or Relativity) is even more undeniable.

Science and technology as a positive force

The above were “value-less facts”; let’s slowly get into values.

The facts above do not have any consequences for policy or behaviour on their own. They do constrain the set of possible outcomes, but for a decision, you need a set of values on top.

It’s still perfectly consistent with the facts and claim the following: Vaccines are safe, but a person’s bodily autonomy cannot be breached in the name of utilitarianism. In the case of children, the parents’ autonomy should be paramount. This is a perfectly intellectually consistent libertarian position. As long as you are willing to accept that children will die as a consequence, then I cannot really say you are denying the scientific evidence. This may seem a shocking trade-off when said out loud but it also happens to be the de facto policy of the Western world for the 10-20 past years: vaccines are recommended, but most jurisdictions will not enforce them anymore.

Similar statements can be made about all of the above:

  • The world is getting warmer, but fossil fuels bring human beings wealth and so, are worth the price to the natural environment. The rest should be dealt with mitigation and geo-engineering. What is important is finding the lowest cost solution for people.
  • Nuclear power is safe, but storing nuclear waste destroys pristine environments and that is a cost not worth paying.
  • GMOs are safe, but messing with Nature/God’s work is immoral.

Empirical facts can provide us with the set of alternatives that are possible, but do not help us weigh alternatives against each other (note how often cost/benefit shows up in the above, but the costs are not all material costs). Still, often being pro-science is understood as being pro technological progress and, thus, anti-GMO or anti-nuclear activism is anti-science.

Science as a community and set of practices

This meaning of “being pro-Science”, science as the community of scientists, is also what leads to views such as being pro-Science means being pro-inclusive Science. Or, on the other side, bringing up Dr. Mengele.

Although it is true that empirically validated facts are shared across humanity, there are areas of knowledge that impact certain people more than others. If there is no effort to uncover the mechanisms underlying a particular disease that affect people in poorer parts of the world, then the efforts of scientists will have a differential impact in the world.

Progress in war is fueled by science as much as progress in any other area and scientists have certainly played (and continue to play) important roles in figuring out ways of killing more people faster and cheaper.

The scientific enterprise is embedded in the societies around it and has, indeed, in the past resorted to using slaves or prisoners. Even in the modern enlightened world, the scientific community has had its share of unethical behaviours, in ways both big and small.

To drive home the point: does supporting science mean supporting animal experiments? Obviously, yes, if you mean supporting the scientific enterprise as it exists. And, obviously, no, if it means supporting empirically validated statements!

The cluster of values that scientists typically share

Scientists tend to share a particular set of values (at least openly). We are pro-progress (in technological and social sense), socially liberal, cosmopolitan, and egalitarian. This is the view behind science is international and people sharing photos of their foreign colleagues on social media.

There is nothing empirically grounded of why these values would be better than others, except that they seem to be statistically more abundant in the minds of professional scientists. Some of this may really be explained by the fact that open minded people will both like science and share this type of values, but a lot of it is more arbitrary. Some of it is selection: given the fact that the career mandates travel and the English language, there is little appeal to individuals who prefer a more rooted life. Some of it is socialization (spend enough time in a community where these values are shared and you’ll start to share them). Some of it is preference falsification (in response to PC, people are afraid to come out and say what they really believe).

In any case, we must recognition that there is no objective sense in which these values are better than the alternative. Note that I do share them. If anything, their arbitrariness is often salient to me because I am even more cosmopolitan than the average scientist, so I see how the barrier between the “healthy nationalism” that is accepted and the toxic variety is a pretty arbitrary line in the sand.

What is funny too is that science is often funded exactly for the opposite reasons: It’s a prestige project for countries to show themselves superior to others, like funding the arts, or the Olympics team. (This is not the only reason to fund science, but it is certainly one of the reasons). You also hear it in Science is what made America great.

Science as an interest group

Science can be an interest group like any other: we want more subsidies & lower taxes (although there is little room for improvement there: most R&D is already tax-exempt). We want to get rid of pesky regulation, and the right to self-regulate (even though there is little evidence that self-regulation works). Science is an interest group.

Being pro-science

All these views of “What do I mean when I am pro-science?” interact and blend into each other: a lot of the resistance to things like GMOs does come from an empirically wrong view of the world and correcting this view thus assuage concerns about GMOs. Similarly, if you accept that science generally results in good things, you will be more in favor of funding it.

Sometimes, though, they diverge. The libertarian view that mixes a strong empiricism and defense of empirically validated facts with an opposition to public funding of science is a minority overall, but over-represented in certain intellectual circles.

On the other hand, I have met many people who support science as a force for progress and as an interest group, but who end up defending all sorts of pseudo-scientific nonsense and rejecting the consensus on the safety of nuclear power or GMOs. This is why I work at a major science institution whose health insurance covers homeopathy: the non-scientific staff will say they are pro-science, but will cherish their homeopathic “remedies”. I also suspect that many people declare themselves as pro-science because they see it as their side versus the religious views they disagree with, even though you can perfectly well be religious and pro-science in accepting the scientific facts.  I would never claim that Amish people are pro-progress and I hazard no guess on their views on public-science funding, but many are happy to grow GMOs as they accept the empirical fact of their safety. In that sense, they are more pro-science than your typical Brooklyn hipster.

Sometimes, these meanings of being pro-science blend into each other by motivated reasoning. So, instead of saying that vaccines are so overwhelmingly safe and that herd immunity is so important that I support mandating them (my view), I can be tempted to say “there is zero risk from vaccines” (which is not true for every vaccine, but I sure wish it were). I can be tempted to downplay the uncertainty about the harder-to-disentangle areas of economic policy and cite the empirical studies that agree with my ideology, and to call those who disagree “anti-scientific.” I might deny that values even come into play at all. We like to pretend there are no trade-offs. This is why anti-GMO groups often start by discussing intellectual property and land-use issues and end up trying to shut down high-school science biology classes.

In an ideal world, we’d reserve the opprobrium of “being anti-science” for those who deny empirical facts and well-validated theories, while discussing all the other issues as part of the traditional political debates (is it worth investing public money in science or should we invest more in education and new public housing? or lowering taxes?). In the real world, we often borrow credibility from empiricism to support other values. The risk, however, is that, what we borrow, we often have to pay back with interest.

What surprised me in 2016

2016 made me reassess an important component of my view of the world. No, not Brexit or Trump becoming President (although, it’s not unrelated).

At the end of 2016, I realized that almost all psychology is pseudo-science. Not hyperbole, not oversold, but pseudo-science.

People used to joke that Parapsychology is the control group for science: i.e., a bunch of people ostentatiously following the scientific method in a situation where every result should come out negative. It’s a null field: the null hypothesis (that there is no effect) is true. Thus, the fact that you can still get positive effects should be worrisome. Turns out the true joke was that psychology is the true control group. Parapsychology was a bad control as most scientists were already predisposed to disbelieve them. Psychology is a much better control.

I had heard of the “Replication Crisis” before, but had not delved into the details. I thought psychology was like microbiome studies: over-hyped but, fundamentally, correct. We may see reports the microbiome makes you be rude to your uber driver or whatever silly effect. We often read about the effects of the microbiome on obesity, as if it didn’t matter that our diets are not as healthy as they should be and it was all down to microbes. Jonathan Eisen collects these as overselling the microbiome. Still, to say that people oversell the microbiome is not to say that there is no effect. The microbes do not single-handedly cause obesity, but they have an impact on the margin (a few BMI points up or down), which is enough to be significant for the population. They may not cause nor cure cancer, but they seem to influence the effect of immunotherapy enough that we may need to adjust dosages/drug combinations. And so on…

I thought that when it came to psychology, the same was true: sure, a lot of hype, but I thought there was a there there. There isn’t.

My basic mistake was that I had shared Daniel Kahneman’s view of the situation:

My position […] was that if a large body of evidence published in reputable journals supports an initially implausible conclusion, then scientific norms require us to believe that conclusion. Implausibility is not sufficient to justify disbelief, and belief in well-supported scientific conclusions is not optional. This position still seems reasonable to me – it is why I think people should believe in climate change.

This was exactly my position until I read this long Andrew Gelman post. Since then, I started to read up on this and find that psychology (as a field) has sold us a bill of goods.

(Computer-programming) language wars a bit silly, but not irrational

I don’t know where I heard it (and it was probably not first hand) the
observation of how weird it is that in the 21st century computer professionals
segregate by the language they use to talk to the machine. It just seems silly, doesn’t it?

Programming language discussions (R vs Python for data science, C++ or Python
for computer vision, Java or C# or Ruby for webapps, …) are a stable of
geekdom and easy to categorize as silly. In this short post, I’ll argue that
that while silly they are not completely irrational.

Programming languages are mostly about tooling

Some languages are better than others, but most of what it matters is not
whether the language itself is any good, but how large the ecosystem around it
is. You can have a perfect language, but if there is no support for it in your
favorite editor/IDE, no good HTTPS libraries which can handle HTTP2.0, then
working in it will be efficient or even less pleasant than working in Java. On
the other hand, PHP is a terrible terrible language, but its ecosystem is (for
its limited domain) very nice. R is a slightly less terrible version of this: not a great language, but a lot of nice libraries and a good culture of documentation.

Haskell is a pretty nice programming language, but working in it got much nicer
once stack appeared on the scene. The
language is the same, even the set of libraries is the same, but having a
better way to install packages is enough to fundamentally change your

On the other hand, Haskell is (still?) enough of a niche language than nobody
has yet written a tool comparable to ccache for
the C/C++ world (instantaneous rebuilds are amazing for a compiled language).

The value of your code increases if you program in a popular language

This is not strictly true: if the work is self-contained, then it may be very
useful on its own even if you wrote it in COBOL, but often the more people can
build upon your work, the more valuable that work is. So if your work is
written in C or Python as opposed to Haskell or Ada, everything else being
equal, it will be more valuable (not everything else is equal, though).

This is somewhat field-dependent. Knowing R is great if you’re a
bioinformatician, but almost useless if you’re writing webserver code. Even
general-purpose languages get niches based on history and tools. Functional
programming languages somehow seems to be more popular in the financial sector
than in other fields (R has a lot of functional elements, but is not typically
thought of as a functional language; probably because functional languages are
“advanced” and R is “for beginners”).

Still, a language that is popular in its field will make your own code more
valuable. Packages upon which you depend will be more likely to be maintained,
tools will improve. If you release a package yourself, it will be more used
(and, if you are in science, maybe even cited).

Changing languages is easy, but costly

Any decent programmer can “pick up” a new language in a few days. I can
probably even debug code in any procedural language even without having ever
seen it before. However, to really become proficient, it often takes much
longer: you need to encounter and internalize the most natural way to do things
in the new language, the quirks of the interpreter/compiler, learn about
different libraries and tools, &c. None of this is “hard”, but it all takes a
long time.

Programming languages have network effects

This is all a different way of saying that programming languages have network
. Thus, if I use language X, it is generally better for me if others
also use it. Not always explicitly, but I think this is the rationale for the programming language discussions.

Utilitarian Scientific Software Development

Yesterday, I added this new feature to ngless: if the user asks it to run a non-existent script, it will try it give an error message with a guess of what you probably really meant.

For example, if you type Profiles.ngl, but the script is actually called profile.ngl:

$ ngless Profiles.ngl

Exiting after fatal error:
File `Profiles.ngl` does not exist. Did you mean 'profile.ngl' (closest match)?

Previously, it would just say Profiles.ngl not found, without a suggestion.

It took me about 10-15 minutes to implement this (actually most of the functionality was already present as ngless already implemented similar functionality in other context). Is it worth it?

When is it worth it to add bells & whistles to research software?

I think we should think about it, in an ideal world, using the utilitarian principle of research software development: software should reduce the overall human effort. If this feature saves more time overall than it took to write, then it’s worth it.

This Utilitarian Principle says that these 15 minutes were well invested if (and only if) this ngless features saves more than 15 minutes for all its users over its lifetime. I expect that every time an user triggers this error, they’ll save a few seconds (say 2 seconds). 15 minutes is 900 seconds. Thus, this feature is worth it if it is triggered over 450 times. Given that we hope that ngless will be widely used, this feature is certainly worth it.

This principle also makes the argument that it would not be worth to add such a feature to a script that is only used in an internal analysis. So, code that was only meant to be used by myself or by myself and a small number of others, should have fewer bells & whistles.

In a non-ideal world, we need to take into account the incentives of the scientific (or commercial) world and the possibility of market failure: the market does not always reward the most selfless behaviour (this includes the “market” for scientific rewards where shiny new things are “paid” more than boring old maintenance).

Scott Sumner on what is a science

And don’t embarrass yourself by arguing macroeconomics is not a science.  Of course it’s a science.  It’s failed science, but then so are some of the other sciences, at least based on what I’ve read about the crisis in replication.  The term ‘science’ is not a compliment, it’s not some sort of award given to a field, like a Nobel Prize.  It’s simply a descriptive term for a field that builds models that try to explain how the world works.  Saying that science must be successful to be viewed as science is as silly as saying that a work of art must be good to be considered art.

Scott Sumner

Psychology is a failed science.

No, computers are not setting us up for disaster

Yesterday, the Guardian published a long essay by Tim Harford on the dangers of automation. The argument is not new (I first heard it on the econtalk episode with David Mindell), and the characteristic example is that of the Air France flight that crashed in the middle of the ocean after the autopilot handed control back to the human pilots who immediately proceeded to crash the plane. As I read it the argument runs as follows: (a) full automation is impossible, (b) partial automation erodes skills, therefore (c) we should be wary of over-automating.

On twitter, I responded with the snark that that medium encourages:

But I feel I should make a longer counter-argument.

1. Despite being a good visual (a plane crash is dramatic), the example of an airplane crash in 2009 is a terrible one. Commercial civil aviation is incredibly safe. Commercial aviation is so safe, I wouldn’t be surprised to read a contrarian Marginal Revolution post arguing it’s now too safe and we need more dangerous planes. I would be very careful in arguing that somehow whatever the aviation community does, is not working based on a single incident that happened 7 years ago. If this was happening every 7 weeks, then it would be a very worrying problem, but it doesn’t.

2. Everything I’ve heard and read about that Air France accident seems to agree that the pilots were deeply incompetent. I have also gotten the distinct impression that if the system had not handed back control to the humans, they would not have crashed the plane. It is simply asserted that we cannot have completely autonomous planes, but without evidence. Perhaps at the very least, it should be harder for the humans to override the automated control. Fully automated planes would also not be hijackable in a 9/11 way nor by their own pilots committing suicide (which given how safe planes are, may now be a significant fraction of airplane deaths!).

3. Even granting the premise of the article, that (a) full automation is impossible and (b) partial automation can lead to skill erosion, the conclusion that “the database and the algorithm, like the autopilot, should be there to support human decision-making” is a non sequitor. It assumes that the human is always a better decision maker, which is completely unproven. In fact, I rather feel that the conclusion is the opposite: the pilot should be there (if a pilot is needed, but let’s grant that) to support the autopilot. Now, we should ask: what’s the best way for pilots to support automated systems? If it is to intervene in times of rare crisis, then pilots should perhaps train like other professionals who are there for crises: a lot of simulations and war games for the cases that we hope never happen. Perhaps, we’ll get to a world where success is measured by having pilots spend their whole careers without ever flying a plane, much like a Secret Service agent trains for the worst, but hopes to never have to throw themselves in front of a bullet.

4. Throughout the essay, it is taken as a given that humans are better and computers are there to save on effort. There is another example, that of meteorologists who now trust the computer instead of being able to intuit when the computer has screwed up, which is what used to happen, but I don’t see an argument that their intuition is better than the computer. If you tell me that the veteran meteorologists can beat the modern systems, I’ll buy that, but I would also think that maybe it’s because the veteran meteorologists were working when the automated systems weren’t as good as the modern ones.

5. The essay as a whole needs to be more quantitative. Even if computers do cause different types of accident, we need to have at least an estimate of whether the number of deaths is larger or smaller than using other systems (humans). I understand that authors do not always choose their titles, but I wouldn’t have responded if title of the essay had been “It won’t be perfect: how automated systems will still have accidents”.

6. The skill erosion effect is interesting per se and there is some value in discussing it and being aware of it. However, I see no evidence that it completely erases the gains from automation (rather than being a small “tax” or clawback on the benefits of automation) and that the solution involves less automation rather than either more automation or a different kind of human training.

7. My horse riding skills are awful.