The “Label it twice” Principle

In many machine learning based applications, you need labeled data. This often means asking “experts” to label your data. I hereby introduce the label it twice principle:

Whenever you ask experts to label data, always get some data independently labeled by more than one.

I have seen projects where two people are deemed capable of labeling data, simply split the data 50/50. This is a huge missed opportunity. If you cannot afford to have the two labelers label all the data, split it 40-20-40, please: 40% for labeler 1, 40% for labeler 2, and 20% overlap.


There is so much evidence that human labelers can be unreliable and that inter-operator differences can be huge, that it always worth to have some data to quantify this effect for your problem.


It often even works for the advantage of the automated method. When your method gets 90% accuracy, it is nice to be able to compare to what a human could do.

In fact, in bioimage informatics, it is often the case that the conversation goes like this (rarely so clean and nice for my side, but it’s my blog and I’ll abridge if I want to):

Me: Our automated method gets 90% accuracy.

Audience member: Doesn’t that just show that it’s not ready for prime time? I mean, if it fails 10% of the time…

Me: The alternative right now is human visual analysis.

Audience member: Experts will know better.

Me: We measured, they don’t. You think this is an easy problem by picturing extreme phenotypes in your mind. Many real cells are actually much more subtle, especially in high throughput data.

Audience member: OK, that’s a fair point. How well do people do?

Me: 90%, give or take.

Audience member: Oh. And could I use this automated method on my problem?


References for unreliable labelers:

MacLeod, Norman, Mark Benfield, and Phil Culverhouse. “Time to Automate Identification.” Nature 467.7312 (2010): 154–155.

Human vs. machine: evaluation of fluorescence micrographs TW Nattkemper, T Twellmann, H Ritter, W Schubert Computers in biology and medicine 33 (1), 31-43


7 thoughts on “The “Label it twice” Principle

  1. Do you have any information about the application of this to histology? I know very little about this whole subject, and I know computer vision is still in its infancy, but it always stroke me as a prime example where computers can/will start replacing doctors.

    In terms of setup, there are already technicians and high quality imaging methods for preparing the images for doctors to analyze. Also, by definition, doctors are annotating these datasets (with the incentive of not incurring in negligence and malpractice). It seems like a good starting point for machine learning, no?

    • I know of this in passing (ie, people who know more have talked to me about it over beers and such). My understanding is that computers can perform better than humans in many histology tasks, but have not replaced humans mostly bc of liability and cultural concerns (people just hate the idea that it was a computer grading their exam, even if you can prove that the computer makes fewer mistakes). What is in widespread use are systems where the computer flags something for the human and then the human says “Yes, the computer was right” (or “No, this is fine” — the systems are probably tuned to have high false positives & low false negatives).

      One of the big reasons, as I understand it, that computers do outperform humans is that they look at the whole sample, whereas humans would need to spend too much time to cover a whole sample under the microscope, evaluating each possible malformed cell. There are protocols, whereby the pathologist must look at a specific number of fields under the microscope and see so many cells (too avoid a lazy pathologist). However, the computer can just look at all the cells in the sample.

      Then, it flags the ones where it thinks there are malformations and shows them to the human pathologist.

  2. Pingback: Nuclear Segmentation in Microscope Cell Images | Meta Rabbit

  3. Pingback: Seeing is Believing. Which is Dangerous. | Meta Rabbit

  4. Pingback: Paper Review: Paper review: Assessing the efficacy of low-level image content descriptors for computer-based fluorescence microscopy image analysis | Meta Rabbit

  5. Pingback: Evaluating Regression with Cross-Validation | Meta Rabbit

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s