Mahotas: A continuous improvement example

Last week, I tweeted:

Let me highlight a good example of continuous improvement in mahotas and the advantages of eating your own dog food.

I was using mahotas to compute some wavelets, but I couldn’t remember the possible parameter values. So I got some error:

[1]: mh.daubechies(im, 'db4')

ValueError                                Traceback (most recent call last)
<ipython-input-6-2e939df57e6a> in <module>()
----> 1 mh.daubechies(im, 'db4')

/home/luispedro/work/mahotas/mahotas/convolve.pyc in daubechies(f, code, inline)
    492     '''
    493     f = _wavelet_array(f, inline, 'daubechies')
--> 494     code = _daubechies_codes.index(code)
    495     _convolve.daubechies(f, code)
    496     _convolve.daubechies(f.T, code)ValueError: 'db4' is not in list

I could have just looked up the right code and moved on. Instead, I considered the unhelpfulness of the error message a bug and fixed it (here is the commit)

Now we get a better error message:

ValueError: mahotas.convolve: Known daubechies codes are ['D2', 'D4', 'D6',
'D8', 'D10', 'D12', 'D14', 'D16', 'D18', 'D20']. You passed in db4.

You still get an error, but at least it tells you what you should be doing.

Good software works well. Excellent software fails well too.

People Agreeing With Me on the Internet

A couple of things I noticed lately that relate to a few of my posts/ideas.

§ One

Many “scientists should do x” conversations need to be reframed as”the culture of science needs to support x.”

— Jacquelyn Gill (@JacquelynGill) May 16, 2013

This is a very nice way to phrase what I wrote about collective action problems in sharing code

§ Two

First of all, a really cool thing, Thomas J. Webb commented on his own paper with a blog update.

In the blog post, he writes (emphasis is mine):

With some trepidation, I opened up the file of R code I’d used for the original analysis. And got a pleasant surprise: it was readable! Largely this was because I submitted it as an appendix to our paper, and so had taken more care than usual to annotate it carefully. I think this demonstrates an under-apprecaiated virtue of sharing data and code: in preparing it such that it is comprehensible to others, it becomes much more useful to your future self. This point is nicely made in a new paper by Ethan White and colleagues on making using and reusing data easier.

Exactly: sharing code makes you write better code.

(hat tip to @mattjhodgkinson on twitter).

Why I develop open-source scientific software?

In my previous post about scientific software, I argued that it is not in your own personal interest to release scientific code. I argued that it is in society’s interest, but selfishly, you should not do it.

Now, you could accuse me of being a hypocrite (This would be a very modern form of hypocrisy, whereby we behave altruistically while professing to be selfish.) But I don’t think I’m a complete hypocrite.

I started writing open-source code well before I started doing science (when I was in high school) and I initially saw research, to some extent, as a continuation of the same ethos: you do a mix of what is helpful to others and what is interesting to you personally, you share, and you discuss it in frank and open ways (the open source world can be pretty harsh, but it is about the issues).

I once tweeted that the

I believe that the single biggest reason why scientists do not make their code generally available is that they are ashamed of it.

— Luis Pedro Coelho (@luispedrocoelho) August 23, 2012

This was a tweet and it was written in a way targetted to that medium and my twitter persona is often ha ha, only serious.

This led to some more discussion, which @iddux (Iddo Friedberg) captured for posterity on his blog (see also his follow up).

What I did not make clear at the time was that I was also talking about myself.

Releasing code publicly is a commitment mechanism to make myself write better code.

Look at my public code: it is well testedwell documentedrevieweddiscussed.

I wish my private code was always like that, but it is not; it is of lower quality. This is why I try to make as much of my research code public as possible.

  1. When I release new release of the code and in less than 24 hours I get a bug report on something silly, it is more than slightly embarrassing.
  2. The fear of embarrassment is a great motivator.
  3. By releasing code, I write better code.
  4. Therefore, I release code.