Is Cell Segmentation Needed for Cell Analysis?

Having just spent some posts discussing a paper on nuclear segmentation (all tagged posts), let me ask the question:

Is cell segmentation needed? Is this a necessary step in an analysis pipeline dealing with fluorescent cell images?

This is a common FAQ whenever I give a talk on my work which does not use segmentation, for example, using local features for classification (see the video). It is a FAQ because, for many people, it seems obvious that the answer is that Yes, you need cell segmentation. So, when they see me skip that step, they ask: shouldn’t you have segmented the cell regions?

Here is my answer:

Remember Vapnik‘s dictum [1]do not solve, as an intermediate step, a harder problem than the problem you really need to solve.

Thus the question becomes: is your scientific problem dependent on cell segmentation? In the case, for example, of subcellular location determination, it is not: all the cells in the same field display the same phenotype, your goal being the find out what it is. Therefore, you do not need to have an answer for each cell, only for the whole field.

In other problems, you may need to have a per-cell answer: for example in some kinds of RNAi experiment only a fraction of the cells in a field display the RNAi phenotype and the others did not take up the RNAi. Therefore, segmentation may be necessary. Similarly, if a measurement such as distance of fluorescent bodies to cell membrane is meaningful, by itself (as opposed to being used as a feature for classification), then you need segmentation.

However, sometimes you can get away without segmentation.


An important point to note is the following: while it may be good to have access to perfect classification, imperfect classification (i.e., the type you actually get), may not help as much as the perfect kind.


Just to be sure, I was not the first person to notice that you do not need segmentation for subcellular location determination. I think this is the first reference:

Huang, Kai, and Robert F. Murphy. “Automated classification of subcellular patterns in multicell images without segmentation into single cells.” Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on. IEEE, 2004. [Google scholar link]

[1] I’m quoting from memory. It may a bit off. It sounds obvious when you put it this way, but it is still often not respected in practice.

To reproduce the paper, you cannot use the code we used for the paper

Over the last few posts, I described my nuclear segmentation paper.

It has a reproducible research archive.


If you now download that code, that is not the code that was used for the paper!

In fact, the version that generates the tables in the paper does not run anymore, because it only runs with old versions of numpy!

In order for it to compute the computation in the paper, I had to update the code. In order to run the code in the paper, you need to get old versions of software.


To some extent, this is due to numpy’s frustrating lack of forward compatibility [1]. The issue at hand was the changed semantics of the histogram function.

In the end, I think I completely avoided that function in my code for a few years as it was toxic (when you write libraries for others, you never know which version of numpy they are running).


But as much as I can gripe about numpy breaking code between minor versions, they would eventually be justified in changing their API with the next major version change.

In the end, the half-life of code is such that each year, it becomes harder to reproduce older papers even if the code is available.

[1] I used to develop for the KDE Project where you did not break user’s code ever and so I find it extremely frustrating to have to explain that you should not change an API on esthetical grounds in between minor versions.

Why Pixel Counting is not Adequate for Evaluating Segmentation

Let me illustrate what I was trying to say in a comment to João Carriço:

Consider the following three shapes:


If the top (red) image is your reference and green and blue are two candidate solutions, then pixel counting (which forms the basis of the Rand and Jaccard indices) will say that green is worse than blue. In fact, green differs by 558 pixels, while blue only by 511 pixels.

However, the green image is simply a fatter version of red (with a circa 2 pixel boundary). Since boundaries cannot be really drawn at pixel level anyway (it is a fuzzy border between background and foreground), it is not an important difference. The blue image, however, has an extra blob and so is qualitatively different.

The Hausdorff distance or my own normalized sum of distances, on the other hand, would say that green is very much like red, while blue is more different. Thus they capture the important differences better than pixel counting. I think this is why we found that these are better measures than Rand or Jaccard (or Dice) for evaluation of segmentation.

(Thanks João for prompting this example. I used this when I gave a talk or two about this paper, but it was lost in the paper because of page limits.)


NUCLEAR SEGMENTATION IN MICROSCOPE CELL IMAGES: A HAND-SEGMENTED DATASET AND COMPARISON OF ALGORITHMS by Luis Pedro Coelho, Aabid Shariff, and Robert F. Murphy in Biomedical Imaging: From Nano to Macro, 2009. ISBI ’09. IEEE International Symposium on, 2009. DOI: 10.1109/ISBI.2009.5193098 [Pubmed Central open access version]

Nuclear Segmentation in Microscope Cell Images

I decided to blog my old papers (from when I did not have a science blog), mostly because of Melissa Terra’s blog (although I cannot hope to have as much success as she had). In any case, expect the next few weeks to go back to the past.

I will start with this one:

NUCLEAR SEGMENTATION IN MICROSCOPE CELL IMAGES: A HAND-SEGMENTED DATASET AND COMPARISON OF ALGORITHMS by Luis Pedro Coelho, Aabid Shariff, and Robert F. Murphy in Biomedical Imaging: From Nano to Macro, 2009. ISBI ’09. IEEE International Symposium on, 2009. DOI: 10.1109/ISBI.2009.5193098 [Pubmed Central open access version]

It’s more of a solid paper than a one announcing a major breakthrough, so it is interesting that this is currently my most cited paper (according to Google Scholar).

The original question of this paper was very simple: is it worth it to code up and run a complex segmentation algorithm over a simple one on that we were working with?

I hand-segmented a bunch of images from our datasets. Frankly, if I knew how much work this would take; I’d not have done it. And I would not have written this paper. I believe that this is why it became widely cited: a lot of people understand the value of the dataset (and use it for their work).

At the centre of the paper, we presented images such as this one, which had been manually segmented (by me and a subset by Aabid Shariff, according to the label it twice principle):


We then implemented some automatic segmentation algorithms and measured which were best able to reproduce the human labeled data.

Major conclusions

1. The method which won was by Lin et al., which is a model-based method [1]. In the meanwhile, however, other groups have reported better results on our dataset (list of citations at Google Scholar).

This means that it is worth it to run a more complex method.

2. Neither the Rand nor the Jaccard indices do very well in method evaluation (the Dice index, also widely used, is equivalent to the Jaccard index).

These indices do not take the pixel location into account. We propose a new metric that does, what we call a spatially-aware evaluation method, the normalised sum of distances (NSD), which does.

3. The NSD metric does better than Rand or jaccard [2].

Another interesting result is that the mean pixel value is a very good threshold for fluorescent microscopy.

Here is the reproducible research archive for this paper.

[1] Yes, their model is in 3D, while our data was 2D. I just don’t want to get into that game of making a minor and obvious tweak to an existing algorithm and calling it new. We used their method with the obvious adaptations for our data.
[2] Nowadays, I might try to develop a metric based on random walks as well. The NSD has the advantage that it is very fast to compute.

Paper Review: Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins

Handfield, L., Chong, Y., Simmons, J., Andrews, B., & Moses, A. (2013). Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins PLoS Computational Biology, 9 (6) DOI: 10.1371/journal.pcbi.1003085

This is an excellent paper that came out in PLoS CompBio last week.

The authors present a high-throughput analysis of yeast fluorescent microscopy images of tagged proteins. Figure 8, panel B (doi:10.1371/journal.pcbi.1003085.g008) shows a few example images from their collection

Figure 8

One interesting aspect is that they work on the dynamic aspects of protein distributions only from snapshots. I was previously involved in a similar project (ref. 18 in the paper [1]) and so I was happy to see others working in this fashion.

Budding yeast, as the name says, buds. A mother cell will create a new bud, that bud will grow and eventually it will split off and become a new daughter cell.

By leveraging the bud size as a marker of cell stage, the authors can build dynamic protein profiles and cluster these. This avoids the need for either (i) chemical synchronization [which has other side-effects in the cell] or (ii) movie acquisition [which besides taking longer, itself damages the cells through photoxicity].

In all of the examples above, you can see a change in protein distribution as the bud grows.


They perform an unsupervised analysis of their data, noting that

Unsupervised analysis also has the advantage that it is unbiased by prior ‘expert’ knowledge, such as the arbitrary discretization of protein expression patterns into easily recognizable classes.

Part of my research goals is to move beyond supervised/unsupervised into mixed models (take the supervision, but take it with a grain of salt). However, this is not yet something that we can do with current machine learning technologies.

The clusters are obtained are found to group together functionally similar genes (details in the paper).


The authors are Bayesian about their estimates in a very interesting way. They evaluate their segmentations against training data, which gives them a confidence measure:

Our confidence measure allows us to distinguish correctly identified cells from artifacts and misidentified objects, without specifying what the nature of artifacts might be.

This is because their measure is a density estimate derived from training based on features of the shape. Now, comes the nice Bayesian point:

This allows us to weight probabilistically data points according to the posterior probability. For classes of cells where our model does not fit as well, such as very early non-ellipsoidal buds, we expect to downweight all the data points, but we can still include information from these data points in our analysis. This is in contrast to the situation where we used a hard threshold to exclude artifacts.

(emphasis mine)


Unlike the authors, I do not tend to care so much about interpretable features in my work. However, it is interesting that such a small number (seven) of features got such good results.

There is more in the paper which I did not mention here: the image processing pipeline (which is fairly standard if you’re familiar with the field, but this unglamorous aspect of the business is where you always spend a lot of time);


One of my goals is to raise the profile of Bioimage Informatics, so I will try to have more papers in this field on the blog.

[1] We worked on mammalian cells, not budding yeast. Their cell cycles are very different and the methods that work in one do not necessarily work in the other.

Paper Review: Distributed 3D image segmentation using micro-labor workforce

DP2: Distributed 3D image segmentation using micro-labor workforce Richard J. Giuly, Keun-Young Kim and Mark H. Ellisman. Bioinformatics doi: 10.1093/bioinformatics/btt154

I just love this paper. It is just at that intersection of quirky and serious which makes you laugh while being dead serious (I admit that it only makes you laugh if you have a very particular sense of humour).

The quirky aspect is the following: they authors solve complex three-dimensional image segmentation problems by using a Amazon Mechanical Turk crowd of untrained workers to do it!

They do so by reducing the problem to a serious of simple yes/no questions that can be understood by people without any background in neurology.

The serious aspect is that it seems that it actually works. It gives good segmentations without resorting to highly-paid experts or very fancy algorithms.


One of the main results that has come out of bioimage informatics that surprises computer vision people and biologists is the following:

Computers can be better than people at bioimage informatics

We (humans) are excellent at face recognition (a task we evolved to do and grew up doing), which is why computer vision researchers who work on this sort of problem tend to revere the human visual systems. However, we just cannot recognize the endoplasmic reticulum. Even trained cell biologists are really not that good at recognising the ER in fluorescent microscopy image.

We can perhaps read this paper in the context in the context of the general discussion of human/computer partnerships. What can humans do for the computer and vice-versa?


I have now gone off on a tangent, but the paper does present a fairly typical image processing pipeline:

  1. Add Gaussian blur to images
  2. Over-segment into super pixels
  3. Merge superpixels into segmentations by performing repeated queries of the form:

Q: Should region A and region B be merged together?

This is all very standard except that Q is performed by humans. In fact, what I think is the main contribution of this paper: Q is performed by non-experts. And it works. By dumbing it down for the human, the computer actually ends up doing well.

It’s briliant!


The thing I do wonder is why this was an Application paper instead of a Research paper. It presents what I think is an interesting new perspective, which seems more valuable than the software (which, by the way, is not even open-source; which limits its worth as well). This also meant that the authors only had two pages in which to expose their methods.

I would have loved to read more results and discussion. I half-suspect that this was not the authors’ choice and can only hope that the increasing digitalization of research publications removes these page limitations.

Segmenting Images In Parallel With Python & Jug

On Friday, I posted an introduction to Jug. The usage was very basic, however. This is a slightly more advanced usage.

Let us imagine you are trying to compare two image segmentation algorithms based on human-segmented images. This is a completely real-world example as it was one of the projects where I first used jug [1].

We are going to build this up piece by piece.

First a few imports:

import mahotas as mh
from jug import TaskGenerator
from glob import glob

Here, we test two thresholding-based segmentation method, called method1 and method2. They both (i) read the image, (ii) blur it with a Gaussian, and (iii) threshold it [2]:

def method1(image):
    # Read the image
    image = mh.imread(image)[:,:,0]
    image  = mh.gaussian_filter(image, 2)
    binimage = (image > image.mean())
    labeled, _ = mh.label(binimage)
    return labeled@TaskGenerator
def method2(image):
    image = mh.imread(image)[:,:,0]
    image  = mh.gaussian_filter(image, 4)
    image = mh.stretch(image)
    binimage = (image > mh.otsu(image))
    labeled, _ = mh.label(binimage)
    return labeled

Just to make sure you see what we are talking about. Here is one possible input image:
What you see is cell nuclei. The very bright areas are noise or unusually bright cells. The results of method 1 can be seen as follows:
image_method1Each color represents a different region. You can see this is not very good as many cells are merged. The reference (human segmented image looks like this):


Running over all the images looks exactly like Python:

results = []
for im in glob('images/*.jpg'):
    m1 = method1(im)
    m2 = method2(im)
    ref = im.replace('images','references').replace('jpg','png')
    v1 = compare(m1, ref)
    v2 = compare(m2, ref)
    results.append( (v1,v2) )

But how do we get the results out?

A simple solution is to write a function which writes to an output file:

def print_results(results):
    import numpy as np
    r1, r2 = np.mean(results, 0)
    with open('output.txt', 'w') as out:
        out.write('Result method1: {}\nResult method2: {}\n'.format(r1,r2))


Except for the “TaskGenerator“ this would be a pure Python file!

With TaskGenerator, we get jugginess!

We can call:

jug execute &
jug execute &
jug execute &
jug execute &

to get 4 processes going at once.


Note also the line:


results is a list of Task objects. This is how you define a dependency. Jug picks up that to call print_results, it needs all the results values and behaves accordingly.

Easy as Py.


You can get the full script above including data from github



Tomorrow, I’m giving a short talk on Jug for the Heidelberg Python Meetup.

If you miss it, you can hear it in Berlin at the BOSC2013 (Bioinformatics Open Source Conference) in July (19 or 20).

[1] The code in that repository still uses a pretty old version of jug, this was 2009, after all. TaskGenerator had not been invented yet.
[2] This is for demonstration purposes; the paper had better methods, of course.
[3] Again, you can do better than Adjusted Rand, as we show in the paper; but this is a demo. This way, we can just call a function in milk