Cross-posting from our group website.
Preamble. We produced two types of code artefacts: (i) code that is supportive of results in a results-driven paper and (ii) software tools intended for widespread use.
For an example of the first type, see the Code Ocean capsule that is linked to (Coelho et al., 2018). The main goal of this type of code release is to serve as an Extended Methods section to the paper. Hopefully, it will be useful for the small minority of readers of the paper who really want to dig into the methods or build upon the results, but the work aims at biological results.
This document focuses on the second type of code release: tools that are intended for widespread use. We’ve released a few of these in the last few years: Jug, NGLess, and Macrel. Here, the whole point is that others use the code. We also use these tools internally, but if nobody else ever adopts the tools, we will have fallen short.
The Six Commitments
- Five-year support (from date of publication) If we publish a tool as a paper, then we commit to supporting it for at least five years from the date of publication. We may stop developing new features, but if there are bugs in the released version, we will assume responsibility and fix them. We will also do any minor updates to keep the tool running (for example, if a new Python version breaks something in one of our Python-based tools, we will fix it). Typically, support is provided if you open an issue on the respective Github page and/or post to the respective mailing-list.
- Standard, easy to install, packages Right now, this means: we provide conda packages. In the future, if the community moves to another system, we may move too.
- High-quality code with continuous integration All our published packages have continuous integration and try to follow best practices in coding.
- Complete documentation We provide documentation for the tools, including tutorials, example data, and reference manuals.
- Work well, fail well We strive to make our tools not only work well, but also “fail well”: that is, when the user provides erroneous input, we attempt to provide good quality error messages and to never produce bad output (including never producing partial outputs if the process terminates part-way through processing).
- Open source, open communication Not only do we provide the released versions of our tools as open source, but all the development is done in the open as well.
Note for group members: This is a commitment from the group and, at the end of the day, the responsibility is Luis’ responsibility. If you leave the group, you don’t have to be responsible for 5 years. If you leave, your responsibility is just the basic responsibility of any author: to be responsive to queries about what was described in the manuscript, but not anything beyond that. What it does mean is that we will not be submitting papers on tools that risk being difficult to maintain. In fact, while the goals above are phrased as outside-focused, they are also internally important so that we can keep working effectively even as group members move on.
Titus Brown posted that good research software takes 2-3 years to produce. As we are close to submitting a manuscript for our own NGLess, which took a bit longer than that, I will add some examples of why it took so long to get to this stage.
There is a component of why it took so long that is due to people issues and to the fact that NGLess was mostly developed as we needed to process real data (and, while I was working on other projects, rather than on NGLess). But even if this had been someone’s full time project, it would have taken a long time to get to where it is today.
It does not take so long because there are so many Big ideas in there (I wish). NGLess contains just one Big Idea: a domain specific language that results in a tool that is not just a proof of concept but a is better tool because it uses a DSL; everything else follows from that.
Rather, what takes a long time is to find all the weird corner cases. Most of these are issues the majority of users will never encounter, but collectively they make the tool so much more robust. Here are some examples:
- Around Feb 2017, a user reported that some samples would crash ngless. The user did not seem to be doing anything wrong, but half-way through the processing, memory usage would start growing until the interpreter crashed. It took me the better part of two days to realize that their input files were malformed: they consisted of a few million well-formed reads, then a multi-Gigabyte long series of zero Bytes. Their input FastQs were, in effect, a gzip bomb.
There is a kind of open source developer that would reply to this situation by saying well, knuckle-head, don’t feed my perfect software your crappy data, but this is not the NGLess way (whose goal is to minimize the effort of real-life people), so we considered this a bug in NGLess and fixed it so that it now (correctly) complains of malformed input and exits.
- Recently, we realized that if you use the motus module in a system with a badly working locale, ngless could crash. The reason is that, when using that module, we print out a reference for the paper, which includes some authors with non-ASCII characters in their names. Because of some weird combination of the Haskell runtime system and libiconv (which seems to generally be a mess), it crashes if the locale is not installed correctly.
Again, there is a kind of developer who would respond to this by well, fix your locale installation, knuckle-head, but we added a workaround.
- When I taught the first ngless workshop in late 2017, I realized that one of inconsistencies in the language was causing a lot of confusion for the learners. So, the next release fixed that issue.
- There are two variants of FastQ files, depending on whether the qualities are encoded by adding 33 or 64. It is generally trivial to infer which one is being used, though, so NGLess heuristically does so. In Feb 2017, a user reported that the heuristics were failing on one particular (well-formed) example, so we improved the heuristics.
- There are 25 commits which say they produce “better error messages”. Most of these resulted from a confused debugging session.
None of these issues took that long to fix, but they only emerge through a prolonged beta use period.
You need users to try all types of bad input files, you need to try to teach the tool to understand where the pain points for new users are, you need someone to try to it out in a system with a mis-installed locale, &c
One possible conclusion it that for certain kinds of scientific software, it is actually better if it is done as a side-project: you can keep publishing other stuff, you can apply it on several problems, and the long gestation period catches all these minor issues, even while you are being productive elsewhere. (This was also true of Jug: it was never really a project per se, but after a long time it became usable and its own paper).
I recently released a new version of mahotas, my computer vision package. It was version 1.0.1, which was quickly followed by 1.0.2.
1.0 had introduced a mistake, which was caught and fixed by Tony S Yu on github (issue 33). Along with a few other improvements, this warranted a new release.
Then, 1.0.1 had a little mistake of its own, caught by Jean-Patrick Pommier [jip’s homepage] on the pythonvision mailing list.
Thus, 1.0.2 was necessary. I released 1.0.2 about an hour after I was alerted to the errors.
This is why I develop my scientific code as open source. (1) Other people catch your mistakes and fix them for you (and the whole community wins)! (2) because the code was public, I felt a great urge to fix the mistakes very fast. I even added tests to make sure they would not happen again. I am harnessing public pressure for good.
Public code becomes better as its openness pushes me to have it as pristine as possible.