Paper Review: Automated prior knowledge-based quantification of neuronal patterns in the spinal cord of zebrafish

Automated prior knowledge-based quantification of neuronal patterns in the spinal cord of zebrafish by Johannes Stegmaier, Maryam Shahid, Masanari Takamiya, Lixin Yang, Sepand Rastegar, Markus Reischl, Uwe Strähle, and Ralf Mikut. in Bioinformatics (2013) [DOI]

It’s been a while since I’ve had a paper review, even though one of my goals is to give more space to bioimage informatics. So, I will try to make up for it in the next few weeks. This is a paper which is not exactly hot off the press (it came out two months ago), but still very recent.

The authors are working with zebrafish. Unfortunately, I am unable to evaluate the biological results as I do now know much about zebrafish, but I can appreciate the methodological contributions. I will illustrate some of the methods based on a Figure (Fig 2) from the paper:

Figure 2

The top panel is the data (a fish spinal coord, cropped out of a larger field), the next two a binarization of the same data and a line fit (in red). Finally, the bottom panel shows the effect of straightening the image to a line. This allows for comparison between different images by morphing them all to a common template. The alignment is performed on only one of the channels, while the others can carry complementary information.


This is very similar to work that has been done in straightening C. elegans images (e.g., Peng et al., 2008) in both intent and some of the general methods (although there you often morph the whole space instead of just a band of interest). It is a bit unfortunate that the bioimage informatics literature sometimes aggregates by model system when many methods can profitably be used across problems.


Finally, I really like this visualization, but I need to give you a bit of background to explain it (if I understood it correctly). Once a profile has been straightened (panel D in the figure above), you can summarize it by averaging along the horizontal dimension to get the average intensity at each location (where zero is the centre of the spinal coord) [1]. You can then stack these profiles (analogously to what you’d do to obtain a kinograph) as a function of your perturbation (in this case, a drug concentration):

Figure 6

This is Figure 6 in the paper.

The effect of the drug (and saturation) become obvious.


As a final note, I’ll leave you with this quote from the paper, which validates some of what I said before: the quality of human evaluation is consistently over-estimated:

Initial tests unveiled intra-expert and inter-expert variations of the extracted values, leading to the conclusion that even a trained evaluator is not able to satisfactorily reproduce results.

[1] The authors average a different marker than the one used for straightening, but since I know little about zebrafish biology, I focus on the methods.

Paper review: Assessing the efficacy of low-level image content descriptors for computer-based fluorescence microscopy image analysis

Paper review:

Assessing the efficacy of low-level image content descriptors for computer-based fluorescence microscopy image analysis by L. Shamir in Journal of Microscopy, 2011 [DOI]

This is an excellent simple paper [1]. I will jump to the punchline (slightly edited by me for brevity):

This paper demonstrates that microscopy images that were previously used for developing and assessing the performance of bioimage classification algorithms can be classified even when the biological content is removed from the images [by replacing them with white squares], showing that previously reported results might be biased, and that the computer analysis could be driven by artefacts rather than by the actual biological content.

Here is an example of what the author means:


Basically, the author shows that even after modifying the images by drawing white boxes where the cells are, classifiers still manage to do apparently well. Thus, they are probably picking up on artefacts instead of signal.

This is (and this analogy is from the paper, although not exactly in this form) like a face recognition system which seems to work very well because all of the images it has of me have me wearing the same shirt. It can perform very well on the training data, but will be fooled by anyone who wears the same shirt.


This is a very important work as it points to the fact that many previous results were probably overinflated. Looking at the dates when this work was done, this was probably at the same time that I was working on my own paper on evaluation of subcellular location determination (just that it took a while for that one to appear in print).

I expect that my proposed stricter protocol for evaluation (train and test on separate images) would be more protected against this sort of effect [2]: we are now modeling the real problem instead of a proxy problem.


I believe two things about image analysis of biological samples:

  1. Computers can be much better than humans at this task.
  2. Some (most? much of?) published literature overestimates how well computers do with the method being presented.

Note that there is no contradiction between the two, except that point 2, if widely believed, can make it harder to convince people of point 1.

(There is also a third point which is most people overestimate how well humans do.)

[1] Normally, I’d review recent papers only, but this not only had this one escaped my attention when it came out (in my defense, it came out just when I was trying to finish my PhD thesis), but it deals with themes I have blogged about before.
[2] I tried a bit of testing around here, but it is hard to automate the blocking of the cells. Automatic thresholding does not work because it depends on the shape of the signal! This is why the author of this paper drew squares by hand.

Paper Review: FuncISH: learning a functional representation of neural ISH images

Noa Liscovitch, Uri Shalit, & Gal Chechik (2013). FuncISH: learning a functional representation of neural ISH images Bioinformatics DOI: 10.1093/bioinformatics/btt207

This is part of the ISMB 2013 Proceedings series, which I am interested in as I’ll be going to Berlin and is a Bioimage Informatics paper, which I’m keen to cover, so it was only natural I’d review it here.


The authors are analysing in-situ hybridization (ISH) images from the Allen Brain Atlas. Figure 1 in the paper shows an example:



The authors use the images an input for a functional classifier. The input to this classifier is an image and the output are functional GO terms. At least a confidence level for each GO term in the vocabulary.

You can read the details in Section 3.1, but the system works to predict functional GO terms. Especially, as one would expect, neuronal categories. This is very interesting and I hope that the authors (or others) will pick up on the specific biology that is being predicted here and see if it can be used further. [1]

Alternatively, you can see this model as a dimensionality reduction approach, whereby images are projected into the space of GO terms. For this, one considers the continuous confidence levels rather than binary classifications.

In this space, it is possible to compute similarity scores between images, which operate at a functional rather than simply appearance level. The results are much better than simply comparing the image features directly (see Figure 4 for details). There is a lot of added value in considering the functional annotations rather than simple appearance.


I was very interested in the methods and the details, as the authors used SIFT and a bag-of-words approach. I have a paper coming out showing that SURF+bag-of-words works very well for subcellular determination. This paper provides additional evidence that this family of techniques works well in bioimage analysis, even if the problem areas are different.

They do make an interesting a few interesting remarks which I’ll highlight here:

Although their name suggest differently, SIFT descriptors at several scales capture different types of patterns.

The original SIFT were developed for natural image matching where the scale is unknown and may even vary within the same image (if a person is standing close-by and another one is far away, they will be at different scales). However, this is not the case with bioimage analysis.


Interestingly, the four visual words with the highest contribution to classification were the words counting the zero descriptors in each scale. This means that the highest information content lies in ‘least informative’ descriptors, and that overall expression levels (‘sparseness’ of expression) are important factors in functional prediction of genes based on their spatial expression.

This is interesting, although an alternative hypothesis is that the null descriptors capture a very different type of information. Since there are only 4 of them, these capture all this content. The other 2000 words are often highly correlated. Thus, they have high information content per group. Because of the penalized regression (in L2), the weight is spread around the correlated values.


Finally, I agree with this statement:

Combining local and global patterns of expression is, therefore, an important topic for further research.

[1] Unfortunately, my understanding of neuroscience does not go much beyond if I drink too much coffee, I get a headache. So, I cannot comment on whether these predictions make much sense.

Paper Review: Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins

Handfield, L., Chong, Y., Simmons, J., Andrews, B., & Moses, A. (2013). Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins PLoS Computational Biology, 9 (6) DOI: 10.1371/journal.pcbi.1003085

This is an excellent paper that came out in PLoS CompBio last week.

The authors present a high-throughput analysis of yeast fluorescent microscopy images of tagged proteins. Figure 8, panel B (doi:10.1371/journal.pcbi.1003085.g008) shows a few example images from their collection

Figure 8

One interesting aspect is that they work on the dynamic aspects of protein distributions only from snapshots. I was previously involved in a similar project (ref. 18 in the paper [1]) and so I was happy to see others working in this fashion.

Budding yeast, as the name says, buds. A mother cell will create a new bud, that bud will grow and eventually it will split off and become a new daughter cell.

By leveraging the bud size as a marker of cell stage, the authors can build dynamic protein profiles and cluster these. This avoids the need for either (i) chemical synchronization [which has other side-effects in the cell] or (ii) movie acquisition [which besides taking longer, itself damages the cells through photoxicity].

In all of the examples above, you can see a change in protein distribution as the bud grows.


They perform an unsupervised analysis of their data, noting that

Unsupervised analysis also has the advantage that it is unbiased by prior ‘expert’ knowledge, such as the arbitrary discretization of protein expression patterns into easily recognizable classes.

Part of my research goals is to move beyond supervised/unsupervised into mixed models (take the supervision, but take it with a grain of salt). However, this is not yet something that we can do with current machine learning technologies.

The clusters are obtained are found to group together functionally similar genes (details in the paper).


The authors are Bayesian about their estimates in a very interesting way. They evaluate their segmentations against training data, which gives them a confidence measure:

Our confidence measure allows us to distinguish correctly identified cells from artifacts and misidentified objects, without specifying what the nature of artifacts might be.

This is because their measure is a density estimate derived from training based on features of the shape. Now, comes the nice Bayesian point:

This allows us to weight probabilistically data points according to the posterior probability. For classes of cells where our model does not fit as well, such as very early non-ellipsoidal buds, we expect to downweight all the data points, but we can still include information from these data points in our analysis. This is in contrast to the situation where we used a hard threshold to exclude artifacts.

(emphasis mine)


Unlike the authors, I do not tend to care so much about interpretable features in my work. However, it is interesting that such a small number (seven) of features got such good results.

There is more in the paper which I did not mention here: the image processing pipeline (which is fairly standard if you’re familiar with the field, but this unglamorous aspect of the business is where you always spend a lot of time);


One of my goals is to raise the profile of Bioimage Informatics, so I will try to have more papers in this field on the blog.

[1] We worked on mammalian cells, not budding yeast. Their cell cycles are very different and the methods that work in one do not necessarily work in the other.

Paper Review: Dual Host-Virus Arms Races Shape an Essential Housekeeping Protein

Demogines, A., Abraham, J., Choe, H., Farzan, M., & Sawyer, S. (2013). Dual Host-Virus Arms Races Shape an Essential Housekeeping Protein PLoS Biology, 11 (5) DOI: 10.1371/journal.pbio.1001571

This paper is not really related to my research, but I always enjoy a good cell biology story. My review is thus mostly a retelling of what I think were the highlights of the story.

In wild rodent populations, the retrovirus MMTV and New World arenaviruses both exploit Transferrin Receptor 1 (TfR1) to enter the cells of their hosts. Here we show that the physical interactions between these viruses and TfR1 have triggered evolutionary arms race dynamics that have directly modified the sequence of TfR1 and at least one of the viruses involved.

What is most interesting is that TfR1 is a housekeeping gene involved in iron uptake, which is essential for survival. Thus, it is probably highly constrained in its defensive evolution as even a small loss of function can be deleterious for the host.

The authors looked at the specific residues which seem to mutate rapidly in rodent species and they map to known virus/protein contact regions (which are known from X-ray crystallography).

Interestingly, the same evolutionary patterns are visible in rodent species for which no known virus use this entry point. However (and this is cool) we can find viral fossils in the genome of these rodents (i.e., we can parts of the viral sequence in the genome, which indicate that somewhere in the evolutionary past of these animals, a retrovirus integrated into the genome).


This process also explains why some viruses infect some species and not others: divergent evolution of the virus itself to catch up with the defensive evolution of different hosts makes them unable to infect across species. Thus, whenever the host mutates, it forces the virus gene to make an awkward choice: does it want to chase the new host surface and specialize to this species or let this species go as a possible target?

Paper Review: Distributed 3D image segmentation using micro-labor workforce

DP2: Distributed 3D image segmentation using micro-labor workforce Richard J. Giuly, Keun-Young Kim and Mark H. Ellisman. Bioinformatics doi: 10.1093/bioinformatics/btt154

I just love this paper. It is just at that intersection of quirky and serious which makes you laugh while being dead serious (I admit that it only makes you laugh if you have a very particular sense of humour).

The quirky aspect is the following: they authors solve complex three-dimensional image segmentation problems by using a Amazon Mechanical Turk crowd of untrained workers to do it!

They do so by reducing the problem to a serious of simple yes/no questions that can be understood by people without any background in neurology.

The serious aspect is that it seems that it actually works. It gives good segmentations without resorting to highly-paid experts or very fancy algorithms.


One of the main results that has come out of bioimage informatics that surprises computer vision people and biologists is the following:

Computers can be better than people at bioimage informatics

We (humans) are excellent at face recognition (a task we evolved to do and grew up doing), which is why computer vision researchers who work on this sort of problem tend to revere the human visual systems. However, we just cannot recognize the endoplasmic reticulum. Even trained cell biologists are really not that good at recognising the ER in fluorescent microscopy image.

We can perhaps read this paper in the context in the context of the general discussion of human/computer partnerships. What can humans do for the computer and vice-versa?


I have now gone off on a tangent, but the paper does present a fairly typical image processing pipeline:

  1. Add Gaussian blur to images
  2. Over-segment into super pixels
  3. Merge superpixels into segmentations by performing repeated queries of the form:

Q: Should region A and region B be merged together?

This is all very standard except that Q is performed by humans. In fact, what I think is the main contribution of this paper: Q is performed by non-experts. And it works. By dumbing it down for the human, the computer actually ends up doing well.

It’s briliant!


The thing I do wonder is why this was an Application paper instead of a Research paper. It presents what I think is an interesting new perspective, which seems more valuable than the software (which, by the way, is not even open-source; which limits its worth as well). This also meant that the authors only had two pages in which to expose their methods.

I would have loved to read more results and discussion. I half-suspect that this was not the authors’ choice and can only hope that the increasing digitalization of research publications removes these page limitations.

Paper Review: Probabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences


PaperProbabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences by Dazhi Jiao, Yuzhen Ye, and Haixu Tang

This is a recent paper in Plos CompBio which looks at the following problem:

  1. Metagenomics (the sequence of genomes in mixed population samples) gives us many genes. Many of these genes can be mapped to enzymes.
  2. Those enzyme can be mapped (with a database such as KEGG) to reactions, but these assignments are ambiguous.
  3. Can we obtain a consensus interaction network?

The authors approach the problem probabilisticly by computing, for each possible interaction network, a probability. Their model is based on the notion that real interaction networks probably have fewer metabolites than all possible combinations. From the paper (my emphasis):

However, if the product of a gene is annotated to catalyze multiple reactions, some of these reactions may be excluded from the sampled subnetwork, as long as at least one of these reactions is included. We note that, according to this condition, each sampled subnetwork represents a putative reconstruction of the collective metabolic network of the metagenome, among which we assume the subnetworks containing fewer metabolites are more likely to represent the actual metabolism of the microbial community than the ones containing more metabolites.

From this idea, using standard MCMC they are able to assign to each reaction a probability that it is part of a community.

They validate their method using clustering. They show that using their probability assignments results in better separation of samples that relying on naïve assignments of all enzymes to all possible reactions. The result is nice and clean.

To reduce this to bare essentials, the point is that their method (on the right) gets the separation between the different types of samples (represented by different symbols) better than any alternatives.

Hierarchical clustering of 44 IMG/M metagenomics samples represented in dendrograms.

They also suggest that they are able to extract differentially present reactions better than the baseline methods. Unfortunately, due to the lack of a validated result, it is really impossible to know whether they just got more false positives. I do not really know how to do it better, though. This is just one of those fundamental problems in the field: the lack of validated information to build upon.

However, it is good to be able to even talk of differentially expressed reactions instead of just genes or orthologous groups.

In global, the authors present an interesting formulation of a hard problem. I always like the idea of handling uncertainty probabilistically and it is good to see that it really does work.

This is the sort of paper that opens up a bunch of questions immediately on extensions:

  • Can similar methods handle uncertainty in the basic gene assignments?
  • Or KEGG annotations?

Currently, they assume that all enzymes are actually present and perform one of the functions listed, but neither of these statements is always true.

Another area where their method could be taken is whether to move up from computing marginal probabilities of single reactions and into computing small subnetworks. I hope that the authors are exploring some of these questions and present us with some follow up work in the near future.