Is Cell Segmentation Needed for Cell Analysis?

Having just spent some posts discussing a paper on nuclear segmentation (all tagged posts), let me ask the question:

Is cell segmentation needed? Is this a necessary step in an analysis pipeline dealing with fluorescent cell images?

This is a common FAQ whenever I give a talk on my work which does not use segmentation, for example, using local features for classification (see the video). It is a FAQ because, for many people, it seems obvious that the answer is that Yes, you need cell segmentation. So, when they see me skip that step, they ask: shouldn’t you have segmented the cell regions?

Here is my answer:

Remember Vapnik‘s dictum [1]do not solve, as an intermediate step, a harder problem than the problem you really need to solve.

Thus the question becomes: is your scientific problem dependent on cell segmentation? In the case, for example, of subcellular location determination, it is not: all the cells in the same field display the same phenotype, your goal being the find out what it is. Therefore, you do not need to have an answer for each cell, only for the whole field.

In other problems, you may need to have a per-cell answer: for example in some kinds of RNAi experiment only a fraction of the cells in a field display the RNAi phenotype and the others did not take up the RNAi. Therefore, segmentation may be necessary. Similarly, if a measurement such as distance of fluorescent bodies to cell membrane is meaningful, by itself (as opposed to being used as a feature for classification), then you need segmentation.

However, sometimes you can get away without segmentation.


An important point to note is the following: while it may be good to have access to perfect classification, imperfect classification (i.e., the type you actually get), may not help as much as the perfect kind.


Just to be sure, I was not the first person to notice that you do not need segmentation for subcellular location determination. I think this is the first reference:

Huang, Kai, and Robert F. Murphy. “Automated classification of subcellular patterns in multicell images without segmentation into single cells.” Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on. IEEE, 2004. [Google scholar link]

[1] I’m quoting from memory. It may a bit off. It sounds obvious when you put it this way, but it is still often not respected in practice.

To reproduce the paper, you cannot use the code we used for the paper

Over the last few posts, I described my nuclear segmentation paper.

It has a reproducible research archive.


If you now download that code, that is not the code that was used for the paper!

In fact, the version that generates the tables in the paper does not run anymore, because it only runs with old versions of numpy!

In order for it to compute the computation in the paper, I had to update the code. In order to run the code in the paper, you need to get old versions of software.


To some extent, this is due to numpy’s frustrating lack of forward compatibility [1]. The issue at hand was the changed semantics of the histogram function.

In the end, I think I completely avoided that function in my code for a few years as it was toxic (when you write libraries for others, you never know which version of numpy they are running).


But as much as I can gripe about numpy breaking code between minor versions, they would eventually be justified in changing their API with the next major version change.

In the end, the half-life of code is such that each year, it becomes harder to reproduce older papers even if the code is available.

[1] I used to develop for the KDE Project where you did not break user’s code ever and so I find it extremely frustrating to have to explain that you should not change an API on esthetical grounds in between minor versions.

Why Pixel Counting is not Adequate for Evaluating Segmentation

Let me illustrate what I was trying to say in a comment to João Carriço:

Consider the following three shapes:


If the top (red) image is your reference and green and blue are two candidate solutions, then pixel counting (which forms the basis of the Rand and Jaccard indices) will say that green is worse than blue. In fact, green differs by 558 pixels, while blue only by 511 pixels.

However, the green image is simply a fatter version of red (with a circa 2 pixel boundary). Since boundaries cannot be really drawn at pixel level anyway (it is a fuzzy border between background and foreground), it is not an important difference. The blue image, however, has an extra blob and so is qualitatively different.

The Hausdorff distance or my own normalized sum of distances, on the other hand, would say that green is very much like red, while blue is more different. Thus they capture the important differences better than pixel counting. I think this is why we found that these are better measures than Rand or Jaccard (or Dice) for evaluation of segmentation.

(Thanks João for prompting this example. I used this when I gave a talk or two about this paper, but it was lost in the paper because of page limits.)


NUCLEAR SEGMENTATION IN MICROSCOPE CELL IMAGES: A HAND-SEGMENTED DATASET AND COMPARISON OF ALGORITHMS by Luis Pedro Coelho, Aabid Shariff, and Robert F. Murphy in Biomedical Imaging: From Nano to Macro, 2009. ISBI ’09. IEEE International Symposium on, 2009. DOI: 10.1109/ISBI.2009.5193098 [Pubmed Central open access version]

Nuclear Segmentation in Microscope Cell Images

I decided to blog my old papers (from when I did not have a science blog), mostly because of Melissa Terra’s blog (although I cannot hope to have as much success as she had). In any case, expect the next few weeks to go back to the past.

I will start with this one:

NUCLEAR SEGMENTATION IN MICROSCOPE CELL IMAGES: A HAND-SEGMENTED DATASET AND COMPARISON OF ALGORITHMS by Luis Pedro Coelho, Aabid Shariff, and Robert F. Murphy in Biomedical Imaging: From Nano to Macro, 2009. ISBI ’09. IEEE International Symposium on, 2009. DOI: 10.1109/ISBI.2009.5193098 [Pubmed Central open access version]

It’s more of a solid paper than a one announcing a major breakthrough, so it is interesting that this is currently my most cited paper (according to Google Scholar).

The original question of this paper was very simple: is it worth it to code up and run a complex segmentation algorithm over a simple one on that we were working with?

I hand-segmented a bunch of images from our datasets. Frankly, if I knew how much work this would take; I’d not have done it. And I would not have written this paper. I believe that this is why it became widely cited: a lot of people understand the value of the dataset (and use it for their work).

At the centre of the paper, we presented images such as this one, which had been manually segmented (by me and a subset by Aabid Shariff, according to the label it twice principle):


We then implemented some automatic segmentation algorithms and measured which were best able to reproduce the human labeled data.

Major conclusions

1. The method which won was by Lin et al., which is a model-based method [1]. In the meanwhile, however, other groups have reported better results on our dataset (list of citations at Google Scholar).

This means that it is worth it to run a more complex method.

2. Neither the Rand nor the Jaccard indices do very well in method evaluation (the Dice index, also widely used, is equivalent to the Jaccard index).

These indices do not take the pixel location into account. We propose a new metric that does, what we call a spatially-aware evaluation method, the normalised sum of distances (NSD), which does.

3. The NSD metric does better than Rand or jaccard [2].

Another interesting result is that the mean pixel value is a very good threshold for fluorescent microscopy.

Here is the reproducible research archive for this paper.

[1] Yes, their model is in 3D, while our data was 2D. I just don’t want to get into that game of making a minor and obvious tweak to an existing algorithm and calling it new. We used their method with the obvious adaptations for our data.
[2] Nowadays, I might try to develop a metric based on random walks as well. The NSD has the advantage that it is very fast to compute.