Utilitarian Scientific Software Development

Yesterday, I added this new feature to ngless: if the user asks it to run a non-existent script, it will try it give an error message with a guess of what you probably really meant.

For example, if you type Profiles.ngl, but the script is actually called profile.ngl:

$ ngless Profiles.ngl

Exiting after fatal error:
File `Profiles.ngl` does not exist. Did you mean 'profile.ngl' (closest match)?

Previously, it would just say Profiles.ngl not found, without a suggestion.

It took me about 10-15 minutes to implement this (actually most of the functionality was already present as ngless already implemented similar functionality in other context). Is it worth it?

When is it worth it to add bells & whistles to research software?

I think we should think about it, in an ideal world, using the utilitarian principle of research software development: software should reduce the overall human effort. If this feature saves more time overall than it took to write, then it’s worth it.

This Utilitarian Principle says that these 15 minutes were well invested if (and only if) this ngless features saves more than 15 minutes for all its users over its lifetime. I expect that every time an user triggers this error, they’ll save a few seconds (say 2 seconds). 15 minutes is 900 seconds. Thus, this feature is worth it if it is triggered over 450 times. Given that we hope that ngless will be widely used, this feature is certainly worth it.

This principle also makes the argument that it would not be worth to add such a feature to a script that is only used in an internal analysis. So, code that was only meant to be used by myself or by myself and a small number of others, should have fewer bells & whistles.

In a non-ideal world, we need to take into account the incentives of the scientific (or commercial) world and the possibility of market failure: the market does not always reward the most selfless behaviour (this includes the “market” for scientific rewards where shiny new things are “paid” more than boring old maintenance).

Fakes all the way down

In a confirmation of what I previously wrote about predatory authors:

[R]esearch grants and promotions are awarded on the basis of the number of articles published, not on the quality of the original research.

[…]

This has fostered an industry of plagiarism, invented research and fake journals that Wuhan University estimated in 2009 was worth $150m, a fivefold increase on just two years earlier

From The Economist

/ht In the Pipeline

Friday Links

1. Very nice analysis of mean returns over many years (you expect to not hit your expected value).

(Feel free to ignore the political points)

  1. More on the commercial reuse of open access papers

I have always thought that the biggest problem with the non-commercial clauses is that it is so hard to define what it means. When I released the materials for Programming for Scientists, I added a clause saying that use in a class at an accredited degree granting institution is also permissible. It was not clear to me that teaching a course at a university is a non-commercial activity. If you think that a university obviously is not a commercial enterprise, I invite you to argue your case, not in the comments of the blog, but after enrolling and receiving a tuition bill. So, if your paper is CC-BY-NC, can I distribute it to students in my class?

Obviously, I don’t mean can I go ahead and ignore your license without realistic fear of getting sued by you. I can probably do the same with any closed-access paper. [1]

3. Why do we let doctors prescribe off-label but not test it?

Actually, the doctor is fine doing the randomized study, just not publishing it. Often we have this rule R in context X and it seems horrible to imagine a world where not R in X, but then in context Y (which is very similar to X), not R is just fine. The practice of medicine is heavily regulated and people are very resistant to the idea that it is too much and it should be easier to practice medicine in the US. However, it is fine to be a complete quack and tell people that they can cure cancer with holy water.

In a completely different context, I remember one econtalk where the guest and Russ both agreed that it would be horrible if there were two levels of airplane safety: a safe and expensive plane and a less-safe one for those not paying as much. What horrible society would allow for poorer folks to travel in unsafe planes while the rich have their strict safety inspections? And in my head I’m going a society that allows busses, no? If your X is not plane but a trip between two cities, then we already have a cheap and unsafe alternative to a nice and safe one. But as long as we keep these as separate things, then we reason about them individually and status quo bias kicks in.

4. The problem of article level metrics: they will be gamed.

[1] I recall a dialogue between two people which went something like this: “This is much better than the commercial things.” “It’s not commercial?” “No, it’s from a small producer.” “Who doesn’t sell it for money?” “No, they do sell it, but it’s not a big company, you know?” “But I can go there and buy it, right?” “Yes.” “If they sell it, it’s commercial; they just have a particular marketing concept that appeals to their target demographic.” This was on the topic of cheese, but applies here as well.