Paper review:
Assessing the efficacy of low-level image content descriptors for computer-based fluorescence microscopy image analysis by L. Shamir in Journal of Microscopy, 2011 [DOI]
This is an excellent simple paper [1]. I will jump to the punchline (slightly edited by me for brevity):
This paper demonstrates that microscopy images that were previously used for developing and assessing the performance of bioimage classification algorithms can be classified even when the biological content is removed from the images [by replacing them with white squares], showing that previously reported results might be biased, and that the computer analysis could be driven by artefacts rather than by the actual biological content.
Here is an example of what the author means:
Basically, the author shows that even after modifying the images by drawing white boxes where the cells are, classifiers still manage to do apparently well. Thus, they are probably picking up on artefacts instead of signal.
This is (and this analogy is from the paper, although not exactly in this form) like a face recognition system which seems to work very well because all of the images it has of me have me wearing the same shirt. It can perform very well on the training data, but will be fooled by anyone who wears the same shirt.
§
This is a very important work as it points to the fact that many previous results were probably overinflated. Looking at the dates when this work was done, this was probably at the same time that I was working on my own paper on evaluation of subcellular location determination (just that it took a while for that one to appear in print).
I expect that my proposed stricter protocol for evaluation (train and test on separate images) would be more protected against this sort of effect [2]: we are now modeling the real problem instead of a proxy problem.
§
I believe two things about image analysis of biological samples:
- Computers can be much better than humans at this task.
- Some (most? much of?) published literature overestimates how well computers do with the method being presented.
Note that there is no contradiction between the two, except that point 2, if widely believed, can make it harder to convince people of point 1.
(There is also a third point which is most people overestimate how well humans do.)
[1] | Normally, I’d review recent papers only, but this not only had this one escaped my attention when it came out (in my defense, it came out just when I was trying to finish my PhD thesis), but it deals with themes I have blogged about before. |
[2] | I tried a bit of testing around here, but it is hard to automate the blocking of the cells. Automatic thresholding does not work because it depends on the shape of the signal! This is why the author of this paper drew squares by hand. |