Working Around Bugs in Third Party Libraries

This is another story of continuous improvement, this time in mahotas’ little brother imread.

Last week, Volker Hilsenstein here at EMBL had a few problems with imread on Windows. This is one of those very hard issues: how to help someone on a different platform, especially one which you know nothing about?

In the end, the problem was not Windows per se, but an old version of libtiff. In that version, there is a logic error (literally, there is a condition which is miswritten and always false) and the code will attempt to read a TIFF header from a file even when writing. Mahotas-imread was not ready for this.

Many (especially in the research open-source world, unfortunately) would just say: well, I won’t support broken versions of libtiff: if your code does not adhere to the spec, I am just going to not work for you, if you don’t do exactly what you should, then I won’t work either. See this excellent old essay by Joel Spolsky on this sort of thing.

In my case, I prefer to work around the bug and when libtiff tries to read in write mode, return no data; which it correctly handles. I wrote the following data reading function to pass to libtiff:

tsize_t tiff_no_read(thandle_t, void*, tsize_t) {
        return 0;
}

The purpose of this code is simply to make imread work even on a broken, 5 year old version of a third party library.

§

In the meanwhile, we also fixed compilation in Cygwin as well as a code path which led to a hard crash.

Especially the possibility of a hard crash made me decide that this was important enough to merit a new release.

Advertisements

Using imread to save disk space

Imread recently gained the ability to read&write metadata.

We deal with images around here and they can get very large in terms of disk space. To make things worse, the microscope does not save them in compressed form.

Imread, however, saves in compressed TIFF. So, we needed to (1) open the file and (2) resave it. We also do not want to lose the metadata that comes with the file. in the meanwhile.

This is what I ended up with:

def resave_file(f):
    '''
    resave_file(f)

    Resave a file using imread preserving metadata    Parameters
    ----------
    f : str
        Filename
    '''
    imdata, meta = imread.imread(f, return_metadata=True)
    tf = tempfile.NamedTemporaryFile('w',
                prefix='imread_resave_',
                suffix='.tiff',
                delete=False,
                dir=path.dirname(f))
    tf.close()
    imread.imsave(tf.name, imdata, metadata=meta)
    os.rename(tf.name, f)

On a test directory, disk usage went from 55GB down to 12GB. We use a two-step

Note: This only works with the github version of imread.