Bioimage Analysis Proceedings Session at #ISBMECCB (Live Blogging)

This afternoon, I sat in the Bioimage Informatics Proceedings session.

Automated cellular annotation for high-resolution images of adult Caenorhabditis elegans by Sarah J. Aerni et al. [DOI]

Work uses C. elegans for studying development, where we can uniquely identy all 959 cells. The question is how to do so automatically as it takes hours/days to do so viusally.

Unlike previous work, they use morphological features of cells and not just expected location. They also allow for variable cell division. The result is higher accuracy in labeled data.

FuncISH: learning a functional representation of neural ISH images by Noa Liscovitch et al. [DOI]

(I blogged about this paper before)

This work looks at gene expression in the brain. Images are represented using local features. They do not use the scale invariance of the SIFT representation as the images are all at the same scale.

The genes are mapped to functional annotations, which is more effective than the previously published baselines, which only used the images. This can pick up similarity of genes that are expressed in different cell regions.

Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields by Iulian Pruteanu-Malinici et al. [DOI]

Work with in-situ hybridization images on Drosophila embryos across genes and time. Features were extracted using a sparse Bayesian factor model. Then, the temporal aspect of the data is modeled using a conditional random field, which improves results when compared to considering the inputs as independent.

A high-throughput framework to detect synapses in electron microscopy images by Saket Navlakha et al. [DOI]

Presentation of methodological advances in detecting synapses, involving both new laboratorial and new computational methods. The basic lab technique was a now-unused 50 year-old method. The most interesting aspect is that the experimental technique is justifiedbecause it makes (automatic) analysis easier.

They also tackled the typically ignore problem of generalizing a model learned on a particular set of samples to a new set of similar but not quite the same of samples. They empirically showed that Co-training works well for this problem if you are careful. Nice!

A Few Quick Notes

A very self-centred post:

1. I’m going up to Berlin tomorrow for BOSC 2013 (Bioinformatics Open Source Conference), where I will be talking & have a poster about jug [jug.rtfd.org] and for ISMB 2013 where I will have a poster about our recent Bioinformatics paper [video abstract]. Get in touch if you’ll be there. I’ll live tweet a bit if I can.

2. Then, I will be in Lisbon for the Lisbon Machine Learning Summer School 2013.

3. The Building Machine Learning Systems with Python book will be off to the printers later this week. That’s quite exciting. It should be available on amazon &c by the end of the month (Amazon still has the conservative September 30 date, but I think the editors are on track to beat this [it does not involve us authors very much at this point]).

4. I read the Recomputation Manifesto and feel a need to write a long post on why I think it is mostly pointless. Unfortunately, I don’t have the time today, but it seems that recomputing is not really a very worthwhile goal. It is correlated with other worthwhile goals, but the correlation is not even that high.

Paper Review: FuncISH: learning a functional representation of neural ISH images

Noa Liscovitch, Uri Shalit, & Gal Chechik (2013). FuncISH: learning a functional representation of neural ISH images Bioinformatics DOI: 10.1093/bioinformatics/btt207

This is part of the ISMB 2013 Proceedings series, which I am interested in as I’ll be going to Berlin and is a Bioimage Informatics paper, which I’m keen to cover, so it was only natural I’d review it here.

§

The authors are analysing in-situ hybridization (ISH) images from the Allen Brain Atlas. Figure 1 in the paper shows an example:

FuncISH_Fig1

Results

The authors use the images an input for a functional classifier. The input to this classifier is an image and the output are functional GO terms. At least a confidence level for each GO term in the vocabulary.

You can read the details in Section 3.1, but the system works to predict functional GO terms. Especially, as one would expect, neuronal categories. This is very interesting and I hope that the authors (or others) will pick up on the specific biology that is being predicted here and see if it can be used further. [1]

Alternatively, you can see this model as a dimensionality reduction approach, whereby images are projected into the space of GO terms. For this, one considers the continuous confidence levels rather than binary classifications.

In this space, it is possible to compute similarity scores between images, which operate at a functional rather than simply appearance level. The results are much better than simply comparing the image features directly (see Figure 4 for details). There is a lot of added value in considering the functional annotations rather than simple appearance.

Methodology

I was very interested in the methods and the details, as the authors used SIFT and a bag-of-words approach. I have a paper coming out showing that SURF+bag-of-words works very well for subcellular determination. This paper provides additional evidence that this family of techniques works well in bioimage analysis, even if the problem areas are different.

They do make an interesting a few interesting remarks which I’ll highlight here:

Although their name suggest differently, SIFT descriptors at several scales capture different types of patterns.

The original SIFT were developed for natural image matching where the scale is unknown and may even vary within the same image (if a person is standing close-by and another one is far away, they will be at different scales). However, this is not the case with bioimage analysis.

§

Interestingly, the four visual words with the highest contribution to classification were the words counting the zero descriptors in each scale. This means that the highest information content lies in ‘least informative’ descriptors, and that overall expression levels (‘sparseness’ of expression) are important factors in functional prediction of genes based on their spatial expression.

This is interesting, although an alternative hypothesis is that the null descriptors capture a very different type of information. Since there are only 4 of them, these capture all this content. The other 2000 words are often highly correlated. Thus, they have high information content per group. Because of the penalized regression (in L2), the weight is spread around the correlated values.

§

Finally, I agree with this statement:

Combining local and global patterns of expression is, therefore, an important topic for further research.

[1] Unfortunately, my understanding of neuroscience does not go much beyond if I drink too much coffee, I get a headache. So, I cannot comment on whether these predictions make much sense.