Eager Error Detection in Ngless: A big advantage of a DSL

One of the advantages of ngless is its error detection. For example, consider the following ngless script:

ngless "0.0"
input = fastq("input.fq.gz")
mapped = map(input, ref='hg19')
write(mapped, ofile="output/mapped.bam")

If the directory output does not exist (maybe you meant to write outputs; I know I make this sort of mistake all the time), then ngless will immediately give you an error message:

Line 4: File name ‘outputs/output.sam’ used as output, but directory outputs does not exist.

This is a big advantage compared with traditional tools which would run the pipeline until the last step and then fail. Until last week, though, it would not check the following code:

ngless "0.0"
import "parallel" version "0.0"
sample = lock1(readlines("samples.txt"))
input = fastq(sample + ".fq.gz")
mapped = map(input, ref='hg19')
write(mapped, ofile="output/" + sample + ".mapped.bam")

The parallel module adds the lock1 function which will take the list of samples (in this case read from a file using the readlines function) and select one using a locking mechanism so that several ngless processes can run at the same time and each one will work on a different sample. Now, the output name is being formed depending on inputs. So, ngless could not check it before it starts interpreting the script.

With a commit last week, ngless will now check the script by performing the following transformation:

ngless "0.0"
import "parallel" version "0.0"
sample = lock1(readlines("samples.txt"))
__check_ofile("output/" + sample + ".mapped.bam")
input = fastq(sample + ".fq.gz")
mapped = map(input, ref='hg19')
write(mapped, ofile="output/" + sample + ".mapped.bam")

Now, immediately after the variable sample is set, ngless will build the output path and check that it is available with the right permissions. In this case, readlines and lock1 are very fast functions, so any errors will be reported within a few miliseconds of starting ngless before any expensive computation is performed.

This is only possible because we are working with a domain specific language.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.