Thoughts on driverless cars

Since I’m getting a lot of hits from marginal revolution readers, let me write down a few thoughts I’ve had for a while on a favorite topic there: driverless cars.

1. It will be a wonderful thing. Besides the 1.2 million killed per year (if that is something you want to shrug off with a Besides), driving is a major source of stress in people’s lives. For many, their commute is the absolutely worst part of their day. There are also a not-insignificant number of people who, for one reason or another, do not drive which limits their choices in all sorts of minor ways (which jobs they can take, where they can live, who they can meet socially…)

2. Regulatory issues will be solved. As I wrote, millions of people die in car crashes every year and so the idea that regulators will stop this technology is, even for me, who am not generally a big believer in the abilities of regulatory authorities, is too horrific to contemplate.  It is an interesting reflection on the Great Stagnation that some people think that there may be technology to save a million lives per year (about twice as much as malaria), but our public regulators are so dysfunctional that they will stop it.

Of course, it’s a fallacy to assume that just because something is horrific, it will not happen. However, I have a bit more trust in the power of large corporations to obtain regulatory approval for their products than the pundits who argue this will be a stopping block. There are also intermediate steps that may be taken so that slowly the world moves from the current state to the new, better, driverless cars, state. The introduction of driverless trucks only on some roads is exactly one such step. In the same way, the liability issues will eventually be solved. Regulations may delay the introduction of driverless cars¹, but not stop it.

You just need a one or two jurisdictions to take that first step. It will probably be one of the most enlightened polities like Sweden, Singapore, or Nevada (the UK?). Then, slowly, the rest of the world will copy it.

3. (Human) driving will be illegal. Besides the fact that it kills so many, it will just be possible to have humans drive through intersections like this one. There won’t be any more need for traffic lights, no parking signs, &c If a city wants to cut off a specific street for construction work, or reroute traffic, just update the online map and broadcast it: the driverless cars will then know to avoid that street. So, pretty quickly it will become impossible for humans to drive the streets.

4. Driverless cars will change citiesThey will enable much higher density. Have you ever driven through an empty large city in the middle of the night and felt amazed at how fast you got around without all the traffic? Even Manhattan could be crossed in less than half an hour from one end to the other. It will be like this all the time. Except that there may be second-order effects where even more people move into dense city centres. At the same time, streets can be narrower, parking can be taken off the street (just have the car drive you to your door and then go park itself somewhere far), so there is more buildable area. This line of reasoning leads us to expect much higher densities.

They will also enable much lower density. It is nice to live out in the country, but long commutes are horrible. But if the car is driving itself while you catch up on email or the blogs… This line of reasoning leads us to expect much lower densities and gigantic sprawl.

Maybe will get both: very dense city centres for the young and hip, with a huge suburban rings. Average density is over.

In reality, it’s hard to predict how all of these forces will play out as they push and pull in different directions. However, cities will probably not look the same at all.

5. Driverless taxis and mini-busses will be the future of public transport.

6. What is the time frame? Pretty soon, we’ll have to buy a new car for my wife. I don’t yet worry about the fact that eventually driverless cars will drive down [sorry] the resale value of driver-needed cars, but at some point, yes, this will happen.

However, for a city project, the time frame may be already close enough that any city starting a mass transit project should consider the possibility of driverless cars already. A new tram line whose planning is starting today may only open 10 or 15 years in the future. Is it really a good idea if the possibility of it being obsolete before it even opens is so real? This feels like trolling (imagine showing up at a public hearing and asking your city council what happens with driverless cars), but mass transit may just not make too much sense anymore.

7. It won’t be good for the environment. Yes, it’s possible to have more efficient cars, car sharing as the norm, &c. But lowering the cost of moving around so enormously will certainly make people move around more. Why not go visit someone who lives 1000 miles away on the week-end if the car can drive itself during the night. Fall asleep here, wake up there (without all the hassle of a public sleeping car in a train).

Perhaps if there is a major breakthrough in nuclear fusion, we can replace all the cars by fusion-charged electrics, but until that happens, the small gains in efficiency will most likely be outweighed by the increase in consumption.

¹ Thousands of people died in this sentence, maybe millions. Even a public hearing which delays decision a few weeks, will cause thousands of deaths. Note that this still happens even if the driverless car roll-out process is spread out over many years, so that the additional deaths do not happen in a  one time event. It’s the way that regulatory agencies kill: they delay the introduction of life-saving technologies and the deaths caused are statistical and invisible.

A Weird Python 3 Unicode Failure

The following code can fail on my system:

from os import listdir
for f in listdir(‘.’):

UnicodeEncodeError: ‘utf-8′ codec can’t encode character ‘\udce9′ in position 13: surrogates not allowed

I have a file with the name b’Latin1 file: \xe9′. This is a filename with a “é” encoded using Latin-1 (which is byte value \xe9)
Python attempts to decode it using the current locale, which is utf-8. Unfortunately, \xe9 is not valid UTF-8, so Python solves this by inserting a surrogate character. So, I get a variable f which can be used to open the file.
However, I cannot print the value of this f because when it attempts to convert back to UTF-8 to print, an error is triggered.
I can understand what is happening, but it’s just a mess. [1]


Here is a complete example:

f = open(‘Latin1 file: é’.encode(‘latin1′), ‘w’)
f.write(“My file”)

from os import listdir
for f in listdir(‘.’):
On a modern Unix system (i.e., one that uses UTF-8 as its system encoding), this will fail.


A good essay on the failure of the Python 3 transition is out there to be written.

[1] ls on the same directory generates a placeholder character, which is a decent fallback.

“Science’s Biggest Fail”

I completely agree with Scott Adams on this one: (many posts tagged nutrition on this blog have echoed the same sentiment)

What’s is science’s biggest fail of all time?

I nominate everything about diet and fitness.

Maybe science has the diet and fitness stuff mostly right by now. I hope so. But I thought the same thing twenty years ago and I was wrong.


Today I saw a link to an article in Mother Jones bemoaning the fact that the general public is out of step with the consensus of science on important issues. The implication is that science is right and the general public are idiots. But my take is different.

I think science has earned its lack of credibility with the public. If you kick me in the balls for 20-years, how do you expect me to close my eyes and trust you?


And I somewhat disagree with this response. It’s a common cop-out:

Who, exactly, does Adams think has been kicking him in the balls for 20 years?

Scientists themselves? Science teachers? Pop-science journalists? He downplays the roles of all these parties in his article[…]

The article says that the problem is pop-science journalists and the people who share their stories on Facebook & twitter.

Sorry, but no. Those parties are somewhat at fault, but so are real, bona fide tenured scientists and the scientific community.

Here is another weak argument:

How indeed? In the scientific journal papers I read, I rarely (if ever) encounter a scientist who claims anything like “this topic is now closed.”

 Of course, scientists rarely say a topic is closed, but they say things like “now that we’ve determined X, this opens new avenues of research.”


The overhyping of nutritional claims by scientist is bad enough that Nature wrote an editorial naming and shaming a Harvard department chair for oversimplifying the research.

Outside of nutrition, look at this egregious paper from 2013, heavily quoted in the public press: Evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River policy Yuyu Chena, Avraham Ebensteinb, Michael Greenstonec, and Hongbin Lie. The abstract says: the results indicate that life expectancies are about 5.5 y (95% CI: 0.8, 10.2) lower in the north owing to an increased incidence of cardiorespiratory mortality. The only hint of how weak the support of the claim is the large width of the confidence interval, but read Andrew Gellman‘s takedown to fully understand how crappy it is.

If you want something medical, this is an older link of scientists misleading journalists.

2014 in Review

A bit past due, but I like to do these year in review posts, even if just for myself. A lot of time it seems that I spend too much time doing absurd tasks like formatting text or figuring out some silly web application or file format, so it’s nice to see that it does come to something tangible in the end.


2014 was slowish, compared to 2013, but I have a few big things in the pipeline (one accepted, another in minor revisions, and the second edition of our book incorporating all errata and some other improvements is almost done).

Still, several projects I was involved with finally got their papers out in 2014:

  1. Metagenomic insights into the human gut resistome and the forces that shape it by Kristoffer Forslund, Shinichi Sunagawa, Luis Pedro Coelho, and Peer Bork in Bioessays (2014)DOI:10.1002/bies.201300143

We looked at the presence of antibiotic resistance potential and how it correlates with animal antibiotic usage across countries.

  1. Trypanosoma brucei histone H1 inhibits RNA polymerase I transcription and is important for parasite fitness in vivo by Pena AC, Pimentel MR, Manso H, Vaz-Drago R, Neves D, Aresta-Branco F, Ferreira FR, Guegan F, Coelho LP, Carmo-Fonseca M, Barbosa-Morais NL, Figueiredo LM. in Mol Microbiol, 2014 Jun 19. DOI:10.1111/mmi.12677
  2. A community effort to assess and improve drug sensitivity prediction algorithms by James C Costello, Laura M Heiser, Elisabeth Georgii, Mehmet Gönen, Michael P Menden, Nicholas J Wang, Mukesh Bansal, Muhammad Ammad-ud-din, Petteri Hintsanen, Suleiman A Khan, John-Patrick Mpindi, Olli Kallioniemi, Antti Honkela, Tero Aittokallio, Krister Wennerberg, NCI DREAM Community, James J Collins, Dan Gallahan, Dinah Singer, Julio Saez-Rodriguez, Samuel Kaski, Joe W Gray & Gustavo Stolovitzky in Nature BiotechnologyDOI:10.1038/nbt.2877
  3. A community computational challenge to predict the activity of pairs of compounds by Mukesh Bansal, Jichen Yang, Charles Karan, Michael P Menden, James C Costello, Hao Tang, Guanghua Xiao, Yajuan Li, Jeffrey Allen, Rui Zhong, Beibei Chen, Minsoo Kim, Tao Wang, Laura M Heiser, Ronald Realubit, Michela Mattioli, Mariano J Alvarez, Yao Shen, NCI-DREAM Community, Daniel Gallahan, Dinah Singer, Julio Saez-Rodriguez, Yang Xie, Gustavo Stolovitzky & Andrea Califano in Nature Biotechnology DOI:10.1038/nbt.3052


The most read blog post during 2014 was (Why Python is Better than Matlab for Scientific Software)[].

For the posts written in 2014, the most read was this comment on modernity, which had a less-read follow-up.

The post that was hardest to write was talking about an academic dry spell.


I did a bit of traveling, giving talks in Leuven, Paris, and San Sebastian, and teaching software carpentry in, Denmark, Cyprus and Jordan. San Sebastian was a fantastic place (I had never really gotten the appeal of the independently wealthy life until I visited San Sebastian).

Another interesting thing I did was a webcast on linear regression, including the funky kind, in Python. It was an interesting experience, maybe I’ll do it again.

Most importantly

Normally, I try to leave my personal life out of this blog (and the public internets), but 2014 was also remarkable for the birth of our second daughter, Sarah.

Computers are better at assessing personality than people?

So claims a new study. At first, I thought this would be due to artificial measurements and scales. That is, if you ask a random person to rate their friends on a 1-10 “openness to experience” scale, they might not know what a 7 actually means once you compare across the whole of the population. However, computers still did slightly better at predicting things like “field of study”.

Given the amount of researcher degrees of freedom, (note that some of the results are presented for “compound variables” instead of measured responses) I think the safe conclusion is computers are as bad as people at reading other humans.

The Ecosystem of Unix and the Difficulty of Teaching It

Plos One published an awful paper comparing Word vs LaTeX where the task was to copy down a piece of text. Because Word users did better than LaTeX users at this task, the authors conclude that Word is more efficient.

First of all, this fits perfectly with my experience: Word [1] is faster for single page documents, where I don’t care about precise formatting, such as a letter. It says nothing about how it performs on large documents which are edited over months (or years). The typical Word failure mode are “you paste some text here and your image placement is now screwed up seven pages down” or “how do I copy & paste between these two documents without messing up the formatting?” This does not happen so much with a single page document.

Of course, the authors are not happy with the conclusion that Word is better for copying down a short piece of predefined text and instead generalize to “that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems.” This is a general failure mode of psychological research: here is a simple, well-defined experimental result in a very artificial setting. Now, let me completely overgeneralize to the real world. The authors of the paper actually say this in their defense: “We understand that people who are not familiar with the experimental methods of psychology (and usability testing) are surprised about our decision to use predefined texts.” That is to say, in our discipline, we are always sort of sloppy, but reviewers in the discipline do the same, so it’s fine.


Now, why waste time bashing a Plos One paper in usability research?

Because one interesting aspect of the discussion is that several people have pointed out that Word is better for collaboration because of the Track Changes features. For many of us, this is laughable because one of the large advantages of LaTeX is that you can use version control on the files. You can easily compare the text written today with a version from two months ago, it makes it easier to have multiple people working, &c.[2] In Word, using Track Changes is still “pass the baton” collaboration, whereby you email stuff around and say “now it’s your turn to edit it” [3].

However, this is only valid if you know how to use version control. In that case, it’s clear that using a text-based format is a good idea and it makes collaboration easier. The same way, I actually think that some of the test subjects in the paper had with LaTeX was simply that they did not use an editor with a spell-checker.

The underlying concept is that LaTeX works in an ecosystem of tools working together, which is a concept that we do not, in general, teach people. I have been involved with Software Carpentry and even before that I was trying teach people who are not trained in computers about these sort of tools, but we do not do that great of a job at teaching this concept, of the ecosystem. It is abstract and not directly clear to students why it is useful.

Spending a few hours going through the basic Unix commands seems like a brain-dead activity when people cannot connect this to their other knowledge or pressing needs.

On the other hand, it is very frustrating when somebody comes to me with a problem they have been struggling with for days and, in a minute, I can give them a solution because it’s often “oh, you can grep in extended mode and pipe it to gawk” (or worse, before they finish the description, I’ll say “run dos2unix and it will fix it” or “the problem you are describing is the exact use case of this excellent Python package, so you don’t need to code it from scratch”). Then they ask “how could I learn that? Is there a book/course?” and I just don’t have an answer better than “do this for 10 years and you’ll slowly get it”.

It’s hard to teach the whole ecosystem at once, which means that it’s hard to teach the abstractions behind it. Or maybe, I just have not yet figured out how it would be possible.


Finally, let me just remark that LaTeX is a particularly crappy piece of software. It is so incredibly bad that it only survives because the alternatives manage to be even worse. It’s even sadder when you realise that LaTeX is now over 30 years old, while Word is an imitation of even older technology We still have not been able to come up with something that is clearly better.


This flawed paper probably had better altmetrics than anything I’ll ever write in science, again showing what a bad idea altmetrics are.

[1] feel free to read “Word or Word-like software” in this and subsequent sentences. I actually often use Google Docs nowadays.
[2] Latexdiff is also pretty helpful in generating diffed versions.
[3] Actually, for collaboration, the Google Docs model is vastly superior as you don’t have to email back-n-forth. It also includes a bit of version control.

New Year Links

1. Excelent Ken Regan article on chess:

László Mérő, in his 1990 bookWays of Thinking, called the number of class units from a typical beginning adult player to the human world champion the depth of a game.

Tic-tac-toe may have a depth of 1: if you assume a beginner knows to block an immediate threat of three-in-a-row but plays randomly otherwise, then you can score over 75% by taking a corner when you go first and grifting a few games when you go second. Another school-recess game, dots-and-boxes, is evidently deeper. […]

This gave chess a depth of 11 class units up to 2800, which was world champion Garry Kasparov’s rating in 1990. If I recall correctly, checkers ({8 \times 8}) and backgammon had depth 10 while bridge tied chess at 11, but Shogi scored 14 and yet was dwarfed by Japan’s main head game, Go, at 25.

2. The Indian government blocked github. Yep, the government there is stupid.