Eager Error Detection in Ngless: A big advantage of a DSL

One of the advantages of ngless is its error detection. For example, consider the following ngless script:

ngless "0.0"
input = fastq("input.fq.gz")
mapped = map(input, ref='hg19')
write(mapped, ofile="output/mapped.bam")

If the directory output does not exist (maybe you meant to write outputs; I know I make this sort of mistake all the time), then ngless will immediately give you an error message:

Line 4: File name ‘outputs/output.sam’ used as output, but directory outputs does not exist.

This is a big advantage compared with traditional tools which would run the pipeline until the last step and then fail. Until last week, though, it would not check the following code:

ngless "0.0"
import "parallel" version "0.0"
sample = lock1(readlines("samples.txt"))
input = fastq(sample + ".fq.gz")
mapped = map(input, ref='hg19')
write(mapped, ofile="output/" + sample + ".mapped.bam")

The parallel module adds the lock1 function which will take the list of samples (in this case read from a file using the readlines function) and select one using a locking mechanism so that several ngless processes can run at the same time and each one will work on a different sample. Now, the output name is being formed depending on inputs. So, ngless could not check it before it starts interpreting the script.

With a commit last week, ngless will now check the script by performing the following transformation:

ngless "0.0"
import "parallel" version "0.0"
sample = lock1(readlines("samples.txt"))
__check_ofile("output/" + sample + ".mapped.bam")
input = fastq(sample + ".fq.gz")
mapped = map(input, ref='hg19')
write(mapped, ofile="output/" + sample + ".mapped.bam")

Now, immediately after the variable sample is set, ngless will build the output path and check that it is available with the right permissions. In this case, readlines and lock1 are very fast functions, so any errors will be reported within a few miliseconds of starting ngless before any expensive computation is performed.

This is only possible because we are working with a domain specific language.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s