Computers are better at assessing personality than people?

So claims a new study. At first, I thought this would be due to artificial measurements and scales. That is, if you ask a random person to rate their friends on a 1-10 “openness to experience” scale, they might not know what a 7 actually means once you compare across the whole of the population. However, computers still did slightly better at predicting things like “field of study”.

Given the amount of researcher degrees of freedom, (note that some of the results are presented for “compound variables” instead of measured responses) I think the safe conclusion is computers are as bad as people at reading other humans.

The Ecosystem of Unix and the Difficulty of Teaching It

Plos One published an awful paper comparing Word vs LaTeX where the task was to copy down a piece of text. Because Word users did better than LaTeX users at this task, the authors conclude that Word is more efficient.

First of all, this fits perfectly with my experience: Word [1] is faster for single page documents, where I don’t care about precise formatting, such as a letter. It says nothing about how it performs on large documents which are edited over months (or years). The typical Word failure mode are “you paste some text here and your image placement is now screwed up seven pages down” or “how do I copy & paste between these two documents without messing up the formatting?” This does not happen so much with a single page document.

Of course, the authors are not happy with the conclusion that Word is better for copying down a short piece of predefined text and instead generalize to “that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems.” This is a general failure mode of psychological research: here is a simple, well-defined experimental result in a very artificial setting. Now, let me completely overgeneralize to the real world. The authors of the paper actually say this in their defense: “We understand that people who are not familiar with the experimental methods of psychology (and usability testing) are surprised about our decision to use predefined texts.” That is to say, in our discipline, we are always sort of sloppy, but reviewers in the discipline do the same, so it’s fine.

§

Now, why waste time bashing a Plos One paper in usability research?

Because one interesting aspect of the discussion is that several people have pointed out that Word is better for collaboration because of the Track Changes features. For many of us, this is laughable because one of the large advantages of LaTeX is that you can use version control on the files. You can easily compare the text written today with a version from two months ago, it makes it easier to have multiple people working, &c.[2] In Word, using Track Changes is still “pass the baton” collaboration, whereby you email stuff around and say “now it’s your turn to edit it” [3].

However, this is only valid if you know how to use version control. In that case, it’s clear that using a text-based format is a good idea and it makes collaboration easier. The same way, I actually think that some of the test subjects in the paper had with LaTeX was simply that they did not use an editor with a spell-checker.

The underlying concept is that LaTeX works in an ecosystem of tools working together, which is a concept that we do not, in general, teach people. I have been involved with Software Carpentry and even before that I was trying teach people who are not trained in computers about these sort of tools, but we do not do that great of a job at teaching this concept, of the ecosystem. It is abstract and not directly clear to students why it is useful.

Spending a few hours going through the basic Unix commands seems like a brain-dead activity when people cannot connect this to their other knowledge or pressing needs.

On the other hand, it is very frustrating when somebody comes to me with a problem they have been struggling with for days and, in a minute, I can give them a solution because it’s often “oh, you can grep in extended mode and pipe it to gawk” (or worse, before they finish the description, I’ll say “run dos2unix and it will fix it” or “the problem you are describing is the exact use case of this excellent Python package, so you don’t need to code it from scratch”). Then they ask “how could I learn that? Is there a book/course?” and I just don’t have an answer better than “do this for 10 years and you’ll slowly get it”.

It’s hard to teach the whole ecosystem at once, which means that it’s hard to teach the abstractions behind it. Or maybe, I just have not yet figured out how it would be possible.

§

Finally, let me just remark that LaTeX is a particularly crappy piece of software. It is so incredibly bad that it only survives because the alternatives manage to be even worse. It’s even sadder when you realise that LaTeX is now over 30 years old, while Word is an imitation of even older technology We still have not been able to come up with something that is clearly better.

§

This flawed paper probably had better altmetrics than anything I’ll ever write in science, again showing what a bad idea altmetrics are.

[1] feel free to read “Word or Word-like software” in this and subsequent sentences. I actually often use Google Docs nowadays.
[2] Latexdiff is also pretty helpful in generating diffed versions.
[3] Actually, for collaboration, the Google Docs model is vastly superior as you don’t have to email back-n-forth. It also includes a bit of version control.

New Year Links

1. Excelent Ken Regan article on chess:

László Mérő, in his 1990 bookWays of Thinking, called the number of class units from a typical beginning adult player to the human world champion the depth of a game.

Tic-tac-toe may have a depth of 1: if you assume a beginner knows to block an immediate threat of three-in-a-row but plays randomly otherwise, then you can score over 75% by taking a corner when you go first and grifting a few games when you go second. Another school-recess game, dots-and-boxes, is evidently deeper. […]

This gave chess a depth of 11 class units up to 2800, which was world champion Garry Kasparov’s rating in 1990. If I recall correctly, checkers ({8 \times 8}) and backgammon had depth 10 while bridge tied chess at 11, but Shogi scored 14 and yet was dwarfed by Japan’s main head game, Go, at 25.

2. The Indian government blocked github. Yep, the government there is stupid.

I’m Back. Monday Links

The last few months have been incredibly busy (hopefully, results should start appearing in print over the next few months). I’ll start writing again now.

A few links on recent(ish) anti-science victories in Europe:

  1. Philae went all the way to a comet, then quickly died because Europeans are afraid of “radiation”. The Americans would still be getting data from that probe as it would be nuclear powered (using a very safe type of nuclear fuel).
  2. Another victory from a coalition of environmental groups, including Greenpeace: The European Commission scrapped the position of scientific advisor. The environmentalists called the position of science advisor corporate lobbying!

American bonus: Anti-GMO activists take vitamins out of breakfast cereal.

Why There Won’t Be a Windows 9

John Cook tells this wonderful story about Windows 10:

The version of Windows following 8.1 will be Windows 10, not Windows 9. Apparently this is because Microsoft knows that a lot of software naively looks at the first digit of the version number, concluding that it must be Windows 95 or Windows 98 if it starts with 9.

Many think this is stupid. They say that Microsoft should call the next version Windows 9, and if somebody’s dumb code breaks, it’s their own fault.

People who think that way aren’t billionaires.

Open source has generally been horrible about this type of thing, with the major exception of the Linux kernel, because Linus’ attitude is very different:

> Are you saying that pulseaudio is entering on some weird loop if the
> returned value is not -EINVAL? That seems a bug at pulseaudio.

[redacted]

It's a bug alright - in the kernel. How long have you been a
maintainer? And you *still* haven't learnt the first rule of kernel
maintenance?

If a change results in user programs breaking, it's a bug in the
kernel. We never EVER blame the user programs. How hard can this be to
understand?

Note the very different attitude of the glibc developers, who broke Flash Player for their users and said it was all Adobe’s fault. Technically, yes, Adobe was abusing the system a bit, but it takes a special level of nerdiness to say “well, I will just break people’s youtube because we are technically correct.”

(This is why we need managers: to tell geeks to cut that sort of shit out.)

When will BLAST get its Nobel Prize?

It’s Nobel Prize Season and there is inevitable speculation on who will get one. I have a negative prediction: The Prize will not be award to the creators of BLAST.

However, I think the creators of BLAST should get a Nobel Prize.

In terms of impact in the field, it’s undeniable that BLAST has been huge. These people created a verb! What modern biologist does not know what “blasting a sequence” means? The BLAST paper was, at one point, the most highly cited paper in history. The impact on physiology is undeniable.

Lipman and Gene Myers stand out for their contributions to the computational processing of biological sequences. (See how I phrased that in a Nobel Committee way).

I know, BLAST was built on previous work like FASTA; but (1) so is everything and (2) FASTA is also Lipman’s work so he can claim credit for that.

The other counterargument I’ve heard is that BLAST is mostly a method, but so was GFP (admittedly, a chemistry prize). One may argue that it was very cool that one could have a protein that fluoresced by itself, but the prize was awarded for the impact in the lab (does anybody believe that just the 1962 discovery of a jellyfish protein would have sufficed for a Nobel?)

The statistical and algorithmic ideas behind BLAST and other methods developed by these people are also very cool (the suffix array is one of those “how did it take so long for someone to think of this?” ideas) while solving a hard problem with a large number of applications.

§

Of course, this Prize would bring prestige to the whole field of computational and systems biology, which may seem very self-serving of me (my field gets a prize, so I get to bask in the glow).

On the other hand, I have often heard that systems biology is just physiology with computers. By this definition, eventually, a systems biology prize will have to be awarded or the prize renamed to “Noncomputational Physiology and Medicine”, which sounds weird.

BLAST was definitely one of the most largest advances in the field of physiology in the last few decades. For this reason, David Lipman and Gene Myers should get a Physiology and Medicine Nobel Prize.