(I wanted to write about this earlier, but June was a crazy month with manuscript submissions, grant submissions, and a lot of travel.)
The first NGLess manuscript was finally published. See this twitter thread for a summary, which I will not rehash here as I have already written extensively about NGLess and the ideas behind it here.
Compared to the preprint, the major change is that (in response to reviewer comments) we enhanced the benchmarking section. Profiling metagenomics tools is difficult. If you use an in silico simulation, you need a realistic distribution as a basis. In our case, we used real data to define the species distribution (using mOTUs2, see https://motu-tool.org/). To obtain the simulated metagenomes, we simulated reads from sequenced genomes according to the real data distribution. There are limitations to this approach, in that we still miss a lot of the complexity of real samples, but we cannot simulate the true unknown. In the end, the functional profiles produced by NG-meta-profiler had high correlations with the ground truth (0.88 for the human gut, 0.82 for the marine environment, Spearman correlation).
Additionally, we included a more explicit discussion of the advantages of NGLess for developing tools like NG-meta-profiler, in the Pipeline design with NGLess section.