Nonspecific Citations

The point of this post is to introduce the term nonspecific citation.

A common problem with antibody is non-specific binding: while the antibody may be targetted to grab on to protein P, it actually also grabs X, Y, and Z, which are somewhat similar to P.

A nonspecific citation is when paper gets cited as an example of a broad class rather than for its specific ideas.


In the case of a review article, it might make some sense: if you are just mentioning the field, you might as well cite my review of bioimage informatics. Even here, though, you could cite other reviews.

In the case of the research articles, though, it is often more of a throwaway citation: we need to mention work in the area of cell image analysis and I have a fairly recent paper on this, so you cite that. Or you cite somebody else’s work as it does not really matter the specific contents of that citation.


There are intermediate levels of specificity.

Maybe you are writing a paper that somehow touts the general usefulness of local features and you write a sentence such as:

Local features are useful (or SURF has been used) in many context such as A [paper], B [paper], and cell image analysis [my work].

Then there is the completely nonspecific:

Recent work in bioimage informatics […, my work, …] …


This will of course lead to a citation Matthew Effect: if you are highly cited, you are likely to get even more citations by getting more nonspecific citations.


Whenever I have a paper I really want to read that is not open-access and to which my institution does not have access, I generally get it by asking the author for a preprint. It has never been a problem and has even led to some good follow up discussions.

On the other hand, if I am not looking for a specific paper but just something in a general area to get acquainted, then it whether something it open access or not may make me choose to read (and then subsequently cite) one paper instead of the other. In fact, just seeing an IEEE link will be enough for me to not even click through to the abstract.

Maybe if open access does have an advantage in getting citations, it’ll mostly come in the form of nonspecific ones.


IMPACT or How I Learned to Start Worrying and Fear Altmetrics

Altmetrics is the idea that scientific publications should be judged (perhaps primarily) on the impact they have in the general media, including on social media. This is in alternative to looking at either citations of journal impact factors.


People who know me outside of science know that antibiotic overuse is a pet peeve of mine [1]. We just published a paper touching on this very subject. It also touched on antibiotic use in agriculture. Both of these can be sold as hot subjects and it’d certainly be possible to try to get some attention in social media with a few bolder statements: antibiotic use in factory farming causes antibiotic resistant infections!

Oh, the altmetrics would go through the roof, but we don’t have the data to support anything like that claim. Our data and analysis is congruent with the idea that antibiotic overuse by humans and farm animals leads to increased resistance which may lead to increased antibiotic resistant infections, but we must acknowledge that there are a large number of confounders and no proof of direct causality. Broadly speaking, people in countries that like to give their animals antibiotics also take a bunch themselves, thus we cannot disentangle farm-to-fork from human antibiotic (over)use. Furthermore, the presence of antibiotic resistant genes is not sufficient to infer the presence of clinically-relevant antibiotic resistant pathogens (this may be a limitation of current methods of analysis, naturally, but a limitation it is). The paper, naturally, has more details on these questions.

We wrote as good scientists, presenting our data and conclusions, acknowledging limitations. We hope to get scientific recognition for this. Most directly in the form of citations, naturally, but more generally in recognition (those people in the Bork lab did a really good job both on their own data and in reviewing other work).

If our incentives were to stir up controversy in social networks, then they would point away from this towards a more polemical stance (and whilst they may, in some sense, draw more engagement with scientific results, they would, in a more fundamental sense, move the discourse away from a evidence-based direction [2]).

When writing blogposts, I put in short pithy sentences for twitter; it’d be dangerous if I did the same when writing a journal paper.


Metrics don’t just measure, they also shape behaviour, you need to solve for the equilibrium.

You need to ask: would it be a good thing if people started, on the margin, to optimize for your metric? In the case of scientists and altmetrics, the answer may be NO.


An unrelated criticism of altmetrics is that they’d be outright gamed and that the scientific world has nowhere close to the capacity to fight spam like google et al. do. The linked article is also notable for using the word meretricious in the title.

Also, do read the rejoinder.

[1] I’m the sort of guy that when a person complains that their doctor didn’t give them antibiotics for the flu is liable to praise the doctor instead of expressing empathy.
[2] In fact, public diffusion of speculative scientific results can lead to mistrust of science as these speculative results will then tend to contradict themselves leading to dismissal of science in general.

Friday Links

1. On Schekman’s pledge to not publish high-profile. I almost called this a balanced view, but then realized that I probably used that phrase to refer to Derek Lowe’s work at least twice in the past. The man is smart and balanced, what can I say?

2. An interesting meeting report (closed access, sorry). Two highlights:

While discussing mutations that predispose to cancer, Nazneen Rahman (Institute of Cancer Research, UK) rightly reminded us that people make big decisions and have parts of their anatomy removed based on their genotype.


Jeanne Lawrence (University of Massachusetts Medical School, USA) convincingly showed that her lab was able to silence one entire copy of chromosome 21 in stem cells in vitro. Trisomy 21 or Down’s syndrome is caused by an extra copy of chromosome 21. […] Lawrence and colleagues inserted XIST (human X-inactivation gene) into chromosome 21 in stem cells with trisomy 21. They then showed using eight different methods that a single copy of the chromosome had indeed been silenced.

3. A good explanation of Bitcoin, the protocol

4. Interesting article about wine & technology in The Economist (which is one of the few mainstream magazines whose science coverage is worth reading [1]).

[1] Actually, I think it’s the only one who can be consistently trusted, but I enjoy anything by Ed Yong wherever he publishes and been reading some excellent articles by Carl Zimmer in The Atlantic.

Why Science is a Third World Economy

Because people are cheap and things are expensive.


To a large extent, it is easier to get money to pay for people (salaries [1]) than to pay for things. Other times, people show up who are willing to work without being paid (they are self-funded). But then you need to get them materials to work with. For that, you need to actually spend some money.  And sometimes you actually have money, but it can only pay for things of type X, but not of type Y, which is what you wanted.

So, it often feel very much like the third-world: a lot of people standing around a few physical resources, and replacement of capital by labour.


A while back I read a review which was comparing several technologies for the same measurement task [2]. There were two high-quality methods in terms of the output. One was very automated but required you buy some kit (~$400), the other was artisanal.

The authors wrote that the first one was good because it was very fast, but expensive. The other one took a long time, but was cheap. They didn’t even price in the cost of labour! They didn’t even ask how many hours of graduate student time you can get for $400.

Which, of course, makes some sense in the public-funded bureaucratic world where money is not fungible. You cannot often reallocate money from stipends to materials.


And then there is that expensive piece of equipment that is not really used because there was a specific half-a-million grant to buy it, but then enthusiasm petered out and the person who was going to use it had gotten a different job by the time the thing was delivered that nobody here really cared to pick it up.

Yep, that’s a third world thing too.

[1] or stipends which are exactly like a salary, except for tax purposes.
[2] I could probably find it now if I looked, but I don’t actually want to lose track of the main point.

People do read your thesis

Last week, @proflikesubstance wrote that you should Publish papers. Your thesis means nothing:

Get the papers out. Don’t focus on an arcane document that will gather dust for the next 50 years until the departmental office needs space and throws the old ones out.


I didn’t even get the bound copies of my thesis. It’s a PDF sitting on a department server (it’s open access!).

Thus, this image of a thesis as a big block of paper, gathering dust in a hard-to-reach library section is out of date. It is a document, widely available, sometimes even read. During my graduate studies, I did read a few theses by others.

Because things are now electronic, people will read your thesis. Probably not PIs (who will read the executive summary, ie, the papers), but graduate students will (and, incredible though it may seem, graduate students are people too). The median paper is probably read by very few people too, by the way.


My thesis is a staple thesis: short intro, review, paperpaperpaper, paper [yet unpublished], short “conclusion”, software paper as appendix.

Have your cake and eat it too.

I did, however, enjoy being able to write without any page limits and put in more details than made it into the final version of the papers. I also put in some side comments that I thought were cool but on the periphery of what the chapter was about. In some ways, I had the opportunity to write down the type of thought I would now cut out of the paper and put on the blog.


This is not to say that you shouldn’t publish in peer-reviewed outlets. You should (as I said, my thesis is a staple thesis, almost all of the content is available in peer-reviewed literature too). But we shouldn’t use bad or out-of-date arguments, like nobody reads your thesis, for it.

You should publish because your résumé looks better with publications than with a thesis. Theses are like assholes, everybody has one (everyone who is applying to the sort of academic jobs we are discussing).

So, write your thesis and publish.

Universities Merging: Improved Rank

Logo of the University of Lisbon

Logo of the University of Lisbon (Photo credit: Wikipedia)

Logo of the Technical University of Lisbon

Logo of the Technical University of Lisbon (Photo credit: Wikipedia)

The university where I got both my BS and MS from will soon cease to exist as a separate entity. The University of Lisbon and the Technical University of Lisbon are merging into one single institution.

The main reason (as I understand it from people at those institutions)  is that there are currently still too many institutional barriers to the cross-disciplinary work that many people are doing across these institutions and that it makes no sense to have two half-universities. A hundred years ago, the faculties were well-defined; nowadays, these divisions make less sense (you can easily imagine a 1950s administrator asking why would an engineer need to have a meeting with a medical specialist? Today, those interactions are crucial).

One interesting side effect, however, is that these institutions will shoot up in several international rankings of research institutes. Many of them do not correct for institute size, so if they measure research output, now it will be much larger: all those papers, all those citations, all those research awards that used to get split into two pots will now go to the same bin.

Nothing will have changed but the accounting. Still, a combined institution will climb many steps in the ranking ladder.

How Long Does PLoS Take to Review a Paper? All PLoS Journals Now

Due to popular demand (at least two people asked, surely that’s demand), here is a generalization of Monday’s work to include a few more PLoS journals. (This was mostly because it was easier to generalize my scripts to process process any PLoS journal.)

Here are the images for all PLoS journals. PLoS 1 at the end is the same figure I posted on Monday.

PLoS Pathogens:


PLoS Neglected Tropical Diseases:


PLoS Medicine:


PLoS Genetics:


PLoS CompBio:


PLoS Biology:


PLoS One:


For PLoS journals, except PLoS Medicine, it seems that the average acceptance time is 5 months. PLoS Medicine takes 7 months.

PLoS Medicine is the journal that takes the longest to review, PLoS One is the fastest (although some of the papers may have been reviewed in another PLoS journal before, speeding up the process).


Script are on github.