Computers are better at assessing personality than people?

So claims a new study. At first, I thought this would be due to artificial measurements and scales. That is, if you ask a random person to rate their friends on a 1-10 “openness to experience” scale, they might not know what a 7 actually means once you compare across the whole of the population. However, computers still did slightly better at predicting things like “field of study”.

Given the amount of researcher degrees of freedom, (note that some of the results are presented for “compound variables” instead of measured responses) I think the safe conclusion is computers are as bad as people at reading other humans.

Friday Links (with comments)

1. Finding who wrote my book. Comment/discuss at twotoreal

2. The Great Forgetting

A lot more mood affiliation than arguments (not even a philosophical alienation type of argument), but I want to highlight two common errors:

[Experts] pointed to the example of driving a car, which requires not only the instantaneous interpretation of a welter of visual signals but also the ability to adapt seamlessly to unanticipated situations. “Executing a left turn across oncoming traffic,” two prominent economists wrote in 2004, “involves so many factors that it is hard to imagine the set of rules that can replicate a driver’s behavior.” Just six years later, in October 2010, Google announced that it had built a fleet of seven “self-driving cars,” which had already logged more than 140,000 miles on roads in California and Nevada.

I do wonder who those experts were and worry that the writer is getting the pace of technology from economists, but that’s not the point. The point is the two economists were correct! But this is argument from lack of imagination.

I’ll even make a stronger claim: Making a left turn involves so many factors that it is impossible to imagine the set of rules that can replicate a driver’s behavior.

The mistake is to assume that this implies that we cannot write a computer programmer who does this task better than any human driver. We do not write computer programs for these tasks by enumeration of rules. We write subsystems, we empirically test (or have the machine empirically test), and w’ e end up with a system much more complex than any single human mind can grasp. This has nothing to do with computers, by the way, nobody can imagine the set of rules that run any large industry: humans can collectively build unimaginably complex systems. In the end, most of the “rules” are implicit in the code and are never even written out.

(Another mistake is that this implicitly assumes that the human driver is somehow doing more than following rules. We all follow rules, except that we follow rules encoded in synapse connections and neuro-transmitter levels rather than magnetic orientations; but rules nonetheless. Or God exists, but I have no need for that hypothesis.)

The second fallacy is even more obvious:

The technology theorist Kevin Kelly, commenting on the link between automation and pilot error, argued that the obvious solution is to develop an entirely autonomous autopilot: “Human pilots should not be flying planes in the long run.” […] That idea is seductive, but no machine is infallible. Sooner or later, even the most advanced technology will break down, misfire, or, in the case of a computerized system, encounter circumstances that its designers never anticipated.

This is textbook nirvana fallacy.


I often wonder whether my daughter will ever be allowed to drive a car in the Western world. I think that allowing humans to drive cars on public roads will be seen as we now look at the working conditions of 19th century factories: dangerous choices that can only be explained by the lack of better options.

Friday Links

1. The Life Cycle of Medical Ideas

I was always taught that there were two kinds of medicine. Real medicine, which has been proven to work by studies. And alternative medicine, which has been proven not to work by studies but people still use it anyway because they are stupid.

This dichotomy leaves out the huge grey area of “things that seem like they will probably work, and a few smaller studies have shown very promising results, but no one has bothered to do larger studies, or if they have they have never really been incorporated into medical practice for reasons I can’t put my finger on.”

2. Don’t Use Hadoop, Your Data Isn’t That Big

What is big data can quickly devolve into penis size comparisons (my data is bigger than yours), but if your data fits in RAM it’s not big, it’s hardly even data.

3. Related: Bayes and Big Data. I think it’s interesting, but (i) there is a lot of effort in quasi-Bayesian method [1] and (ii) small data is still very important.

4. Are some people chimeras?

5. This whole post just rubs me the wrong way.

It’s science as another interest group with the same whiny confusion of inputs and outputs. And classism of I’m a scientist, pay me more than a lowly sanitation worker just upsets my stomach as thus the notion that getting 5-10 years of graduate school entitles you to a six-figure salary. Seriously, the author says that it is what one ought to get for all that schooling.

Almost makes me want to wish for a cut to the NIH budget as a disgusted response to all the entitlement.

(cf. NeuroDojo)

[1] Quasi-Bayesian is a term I coined just now to mean approximate Bayesian for computation tractability.

Mahotas software paper published

I got a new paper published [1]:

Mahotas: open source software for scriptable computer vision by Luis Pedro Coelho in Journal of Open Research Software [DOI]

This is about my computer vision software package, mahotas. It started as a way to do bioimage informatics, but the sotware is actually generic to computer vision.

Figure 1

Earlier, I posted a tutorial on image segmentation which used mahotas.

The journal calls these metapapers in that they are not the work but a reference to the work, i.e., the software. This is an interesting new iniciative to reward scientific software development. As I argued before, release of scientific software is a collective action problem: It is better for science to have software released, but not for the individual researcher. I also wrote that it would be a good idea to change the incentives to make it more profitable to do the right thing. The Journal of Open Research Software is exactly one such step.

It has already been used in a few publications, both by myself and others:

As you can see, this is not all about Bioimage Informatics, which is sort of nice as it means that this is actually useful outside of the field in which it was initially developed.

[1] Yes it has been a good month for publications.

The Hard Part is Motivation. Books. &c

Building Machine Learning Systems with Python

Because my book just came out, I am again excerpting from a Portuguese interview with me I mentioned previously:

Who is the target audience for this book?

Luis Pedro: There are two distinct audiences: The first are programmers who do not know much of machine learning, but liked to use a classifier, for example. The second are people who do not need an introduction to machine learning (because they already know), but maybe do not know how to do it with the tools in Python.

What were the main difficulties encountered in this process?

Luis Pedro: The challenge is always to find examples that are not too easy, but they are not also too difficult for an introduction. In one case (image classification), I took some photographs to create a dataset. In the literature, there are classical examples which are trivial to handle with modern techniques and current research problems that are very difficult. My dataset consists of poorly framed photographs taken with a mobile phone camera. Therefore, it also has a style more akin to a problem area or mobile web. The aim is to distinguish photographs of buildings, natural landscape, or texts.

There was nothing too complex for anyone who knows the area. In fact, when it comes to writing about something you already know how to do (as was the case), the hard part is the motivation.


One thing we stress throughout is principled evaluation and we warn against overselling your results. Even in the world of research, we still find papers that mix training set and test set! Obviously not coming from groups that work in machine learning, but in more applied areas, we still find people who test hyper-parameters in the test set.