A quick follow-up on two earlier posts:
As you can see, NGLess performs much better than either MOCAT or htseq-count. We tried to use featureCounts too, but that completely failed to produce results for some of the samples (we gave it a whopping 1TB of RAM, but it used it all up before crashing).
It also reveals that although ngless was developed in the context of our metagenomics work, it would also likely do well on the type of problems for which htseq-count is currently being used, in the domain of RNA-seq.
Is it even a good idea to have the presentation of the results mixed with their computation?
I do see the value in companion Jupyter notebooks for many cases, but as a replacement for the main paper, I am not even sure it is a good idea.
There is a lot of accidental complexity in code. A script that generates a publication plot may easily have 50 lines that do nothing more than set up the plot just right: (1) set up the subplots, (2) set x- and y-labels, (3) fix colours, (4) scale the points, (5) reset the font sizes, &c. What value is there in keeping all of this in the main presentation of the results?
The little script that generates the plot above is an excellent example of this. It is available online (on github: plot-comparison.py). It’s over 100 lines and, even then, the final result required some minor aesthetic manipulations in inkscape (so that, if you run it, the result is slightly uglier: in particular, the legend is absent).
Would it really add anything to the presentation of the manuscript to have those 100 lines of code be intermingled with the presentation of ngless as a metagenomics profiler?
In this case, I am confident that a Jupyter notebook would be worse than the current solution of a PDF as a main presentation with the data table and plotting scripts as supplemental material.