Introduction to Jug: Parallel Tasks in Python

Next Tuesday, I’m giving a short talk on Jug for the Heidelberg Python Meetup.

If you miss it, you can hear it in Berlin at the BOSC2013 (Bioinformatics Open Source Conference) in July. I will take this opportunity to write a couple of posts about jug.

Jug is a cross between the venerable make and Python. In Make tradition, you write a jugfile.py. Perhaps, this is best illustrated by an example.

We are going to implement the dumb algorithm for finding all primes under 100. We write a function to check whether a number is prime:

def is_prime(n):
    from time import sleep
    # Sleep a little bit so that this does not run ridiculously fast
    sleep(1.)
    for j in xrange(2,n-1):
        if (n % j) == 0:
            return False
    return True

Then we build tasks out of this function:

from jug import Task
primes100 = [Task(is_prime, n) for n in range(2,101))

Each of these tasks is of the form call “is_prime“ with argument “n“. So far, we have only built the tasks, nothing has been executed. One important point to note is that the tasks are all independent.

You can run jug execute on the command line and jug will start executing tasks:

jug execute &

The nice thing is that it is fine to run multiple of these at the same time:

jug execute &
jug execute &
jug execute &
jug execute &

They will all execute in parallel. We can use jug status to check what is happening:

jug status

Which prints out:

Task name                                    Waiting       Ready    Finished     Running
----------------------------------------------------------------------------------------
primes.is_prime                                    0          74          20           5
........................................................................................
Total:                                             0          74          20           5

74 is_prime tasks are still in the Ready state, 5 are currently running (which is what we expected, right?) and 20 are done.

Wait a little bit and check again:

Task name                                    Waiting       Ready    Finished     Running
----------------------------------------------------------------------------------------
primes.is_prime                                    0           0          99           0
........................................................................................
Total:                                             0           0          99           0

Now every task is finished. If we now run jug execute, it will do nothing, because there is nothing for it to do!

§

The introduction above has a severe flaw: this is not how you should compute all primes smaller than 100. Also, I have not shown how to get the prime values. On Monday, I will post a more realistic example.

It will also include a processing pipeline where later tasks depend on the results of earlier tasks.

§

(Really weird thing: as I am typing this, WordPress suggests I link to posts on feminism and Australia. Probably some Australian reference that I am missing here.)