Paper Review: Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins

Handfield, L., Chong, Y., Simmons, J., Andrews, B., & Moses, A. (2013). Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins PLoS Computational Biology, 9 (6) DOI: 10.1371/journal.pcbi.1003085

This is an excellent paper that came out in PLoS CompBio last week.

The authors present a high-throughput analysis of yeast fluorescent microscopy images of tagged proteins. Figure 8, panel B (doi:10.1371/journal.pcbi.1003085.g008) shows a few example images from their collection

Figure 8

One interesting aspect is that they work on the dynamic aspects of protein distributions only from snapshots. I was previously involved in a similar project (ref. 18 in the paper [1]) and so I was happy to see others working in this fashion.

Budding yeast, as the name says, buds. A mother cell will create a new bud, that bud will grow and eventually it will split off and become a new daughter cell.

By leveraging the bud size as a marker of cell stage, the authors can build dynamic protein profiles and cluster these. This avoids the need for either (i) chemical synchronization [which has other side-effects in the cell] or (ii) movie acquisition [which besides taking longer, itself damages the cells through photoxicity].

In all of the examples above, you can see a change in protein distribution as the bud grows.


They perform an unsupervised analysis of their data, noting that

Unsupervised analysis also has the advantage that it is unbiased by prior ‘expert’ knowledge, such as the arbitrary discretization of protein expression patterns into easily recognizable classes.

Part of my research goals is to move beyond supervised/unsupervised into mixed models (take the supervision, but take it with a grain of salt). However, this is not yet something that we can do with current machine learning technologies.

The clusters are obtained are found to group together functionally similar genes (details in the paper).


The authors are Bayesian about their estimates in a very interesting way. They evaluate their segmentations against training data, which gives them a confidence measure:

Our confidence measure allows us to distinguish correctly identified cells from artifacts and misidentified objects, without specifying what the nature of artifacts might be.

This is because their measure is a density estimate derived from training based on features of the shape. Now, comes the nice Bayesian point:

This allows us to weight probabilistically data points according to the posterior probability. For classes of cells where our model does not fit as well, such as very early non-ellipsoidal buds, we expect to downweight all the data points, but we can still include information from these data points in our analysis. This is in contrast to the situation where we used a hard threshold to exclude artifacts.

(emphasis mine)


Unlike the authors, I do not tend to care so much about interpretable features in my work. However, it is interesting that such a small number (seven) of features got such good results.

There is more in the paper which I did not mention here: the image processing pipeline (which is fairly standard if you’re familiar with the field, but this unglamorous aspect of the business is where you always spend a lot of time);


One of my goals is to raise the profile of Bioimage Informatics, so I will try to have more papers in this field on the blog.

[1] We worked on mammalian cells, not budding yeast. Their cell cycles are very different and the methods that work in one do not necessarily work in the other.

Paper Review: Dual Host-Virus Arms Races Shape an Essential Housekeeping Protein

Demogines, A., Abraham, J., Choe, H., Farzan, M., & Sawyer, S. (2013). Dual Host-Virus Arms Races Shape an Essential Housekeeping Protein PLoS Biology, 11 (5) DOI: 10.1371/journal.pbio.1001571

This paper is not really related to my research, but I always enjoy a good cell biology story. My review is thus mostly a retelling of what I think were the highlights of the story.

In wild rodent populations, the retrovirus MMTV and New World arenaviruses both exploit Transferrin Receptor 1 (TfR1) to enter the cells of their hosts. Here we show that the physical interactions between these viruses and TfR1 have triggered evolutionary arms race dynamics that have directly modified the sequence of TfR1 and at least one of the viruses involved.

What is most interesting is that TfR1 is a housekeeping gene involved in iron uptake, which is essential for survival. Thus, it is probably highly constrained in its defensive evolution as even a small loss of function can be deleterious for the host.

The authors looked at the specific residues which seem to mutate rapidly in rodent species and they map to known virus/protein contact regions (which are known from X-ray crystallography).

Interestingly, the same evolutionary patterns are visible in rodent species for which no known virus use this entry point. However (and this is cool) we can find viral fossils in the genome of these rodents (i.e., we can parts of the viral sequence in the genome, which indicate that somewhere in the evolutionary past of these animals, a retrovirus integrated into the genome).


This process also explains why some viruses infect some species and not others: divergent evolution of the virus itself to catch up with the defensive evolution of different hosts makes them unable to infect across species. Thus, whenever the host mutates, it forces the virus gene to make an awkward choice: does it want to chase the new host surface and specialize to this species or let this species go as a possible target?

How Long Does PLoS Take to Review a Paper? All PLoS Journals Now

Due to popular demand (at least two people asked, surely that’s demand), here is a generalization of Monday’s work to include a few more PLoS journals. (This was mostly because it was easier to generalize my scripts to process process any PLoS journal.)

Here are the images for all PLoS journals. PLoS 1 at the end is the same figure I posted on Monday.

PLoS Pathogens:


PLoS Neglected Tropical Diseases:


PLoS Medicine:


PLoS Genetics:


PLoS CompBio:


PLoS Biology:


PLoS One:


For PLoS journals, except PLoS Medicine, it seems that the average acceptance time is 5 months. PLoS Medicine takes 7 months.

PLoS Medicine is the journal that takes the longest to review, PLoS One is the fastest (although some of the papers may have been reviewed in another PLoS journal before, speeding up the process).


Script are on github.

How Long Does Plos One Take to Accept A Paper?

How long do papers take to review?

Too long.

No, seriously, how long? I did a little measurement.

I downloaded the 360 most recent papers from Plos One (as of Friday). They are all annotated with submission and acceptance dates, so it was easy to just compute the differences.

The plot below is a histogram (one bin per day) in grey with a Kernel density estimate as a solid line.

Histogram of acceptance times


The result is it takes about 3 to 4 months to get a paper accepted, but with substancial variance.


Looking at the figure, I had to ask who the poor people were who published that paper which was longest in revision.

Alternative Sigma Factor Over-Expression Enables Heterologous Expression of a Type II Polyketide Biosynthetic Pathway in Escherichia coli by David Cole Stevens, Kyle R. Conway, Nelson Pearce, Luis Roberto Villegas-Peñaranda, Anthony G. Garza, and Christopher N. Boddy. DOI: 10.1371/journal.pone.0064858

Submitted on 29 March 2011 and accepted on 22 April 2013, this paper was 755 days in revision.

The fastest acceptance was only 19 days. However, this being Plos One, it is possible that the paper had been reviewed for another Plos journal, rejected with positive reviews on significance grounds, and had those reviews transferred to Plos One. After this, acceptance followed without a new round of peer review.


This is a gimmick. There is perhaps a paper to be written where this is extended to see what areas of research/keywords/&c matter to acceptance time. If I had more free time I might write that paper.

The code for the above is available on github.

UpdateFollowup with all PLoS Journals.