NGLess timing benchmarks

As part of finalizing a manuscript on NGLess, we have run some basic timing benchmarks comparing NGLess to MOCAT2 (our previous tool) and another alternative for profiling a community based on a gene catalog, namely htseq-count.

The task being profiled is that performed in the NGLess tutorials for the human gut and the ocean: 3 metagenomes are functionally profiled by using a gene catalog as a reference. The time reported is for completing all 3 samples (repeated 3 times to get some variability measure).

The results are that NGLess is overall much faster than the alternatives (note that the Y-axis measures the number of seconds in log-scale).  For the gut dataset, MOCAT takes 2.5x, while for the ocean (tara) one, it takes 4x longer.

ngless-mocat-htseq-count-compare.2.svg.png

The Full  column contains the result of running the whole pipeline, where it is clear that NGLess is much faster than MOCAT2. The other elements are in MOCAT nomenclature:

  • ReadTrimFilter: preprocessing the FastQ files
  • Screen: mapping to the catalog
  • Filter: postprocessing the BAM files
  • Profile: generating feature counts from the BAM files

Htseq-count works well even for this settings which is outside of its original domain (it was designed for RNA-seq, where you have thousands of genes, as opposed to metagenomics, where millions are common). NGLess is still much faster, though.

Note too that for MOCAT, the time it takes for the Full step is simply the addition of the other steps, but in the case of NGLess, when running a complete pipeline, the interpreter can save time.

The htseq-count benchmark is still running, so final results will only be available next week.

I also tried to profile using featureCounts (website), but that tool crashed after using up 800GB of RAM. I might still try it on the larger machines (2TiB of RAM), but it seems pointless.

The scripts and preprocessed data for this benchmark are at https://github.com/BigDataBiology/ngless2018benchmark

Advertisements

One thought on “NGLess timing benchmarks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.