Story of Two Bugs

I while back, I got a bug report for mahotas-imread [1]:

PNG reads in inverted: This image loads in imread 0.3.1 incorrectly. It looks right if I xor it with 255, but I don’t think that’s all that’s wrong.

im16_as8

The first thing I noticed was that this was a 16 bit image. If you’ve been coding for as long as I have, you’ll immediately think: It’s a byte-endiness issue [2]. So, I tested:

imshow(im.byteswap())

and saw the following:

byteswap0

Not good. The I looked at the hint that the original poster provided and it did seem to be true: imshow(~f) worked reasonably well. My working hypothesis was thus that there is a flag whereby the PNG data needs to be interpreted after a bit reversion. I also noticed another thing, though:

max_16bit_value = 2**16-1
imshow(max_16bit_value - f)

Also looks decent.

The TIFF format does allow you to specify whether zero is supposed to be white or black. Maybe PNG has a similar “feature.”

I read through the libpng documentation (which is not great), a bit through its source, and through online descriptions of PNG format. Along the way, I noticed that converting the image to TIFF (with ImageMagick) and loading it with imread also gave the wrong result. Perhaps the TIFF reader had the same bug or ImageMagick [3].

Eventually, I realised that PNG files are in network order (i.e., in big-endian format) and the code did not convert them to little-endian. Thus, my initial intuition had been right: it was a byte-swapping error!

But in this case, why did imshow(f.byteswap()) result in a mangled image?

I stated to suspect that matplotlib had a bug. I tried to do:

imshow(f.byteswap() / 2.**16)

and it resulted in the correct image being shown.

As it turned out, matplotlib does not do the right thing when given 16 bit files. Thus, I had this figured out. I fixed imread and released version 0.3.2 and closed the bug report.

§

A single bug is often easy to debug, but when you have multiple bugs interacting; it is much more than twice as hard.

§

Hairy details: You may want to stop reading now.

Consider the following identities:

255 == 0xff
-f == (f ^ 0xff + 1)
2**16 - f = -f + 2**16 == -f (because of overflow)

Thus, it should not be surprising that flipping the bits or subtracting the image resulted int , in two-bit complement, ~f is roughly -f. Not exactly, but similarly enough that, by eye, it is hard to tell apart.

Finally, it all makes sense when you realise that matplotlib assumes that non-8 bit images are floating point and does:

final_image = (input_image * 255)
final_image = final_image.astype(np.uint8)

Because what is multiplying by 255? It’s the same as multiplying by -1! Thus, matplotlib would multiply by -1 and then take the low order bits. Thus, it showed a correct image if you pre-multiplied it by -1 (or flipped the bits) and gave it a byteswapped image!

[1] People don’t always appreciate how valuable good bug reports are. Seriously, they are a huge help: you are testing the software for me. Unfortunately, either shyness or past bad experiences will often cause people who see something worng to not report it.
[2] I now have over 15 years of experience coding (having had a relative late start [I didn’t care much about computers until I was close to college age], I’ve caught up.) If there is an area where I really feel that my experience shines through is debugging: I’ve seen enough mistakes and errors that my guesses as to what the bug is are more and more accurate (this is true even in code I have not seen).
[3] One of the reasons I started mahotas-imread was that I had not found a single library that could read the formats I wanted without a lot of bugs. So, I trust no one on this. In this case, the paranoia was unwarranted, as we’ll see.

Mahotas and the Python Scientific Ecosystem for Bioimage Analysis

This week, I’m in Barcelona to talk about mahotas at the EuBIAS 2013 Meeting.

You can get a preview of my talk here. The title is “Mahotas and the Python Scientific Ecosystem for Bioimage Analysis” and you’ll see it does not exclusively talks about mahotas, but the whole ecosystem. Comments are welcome (especially if they come in the next 24 hours).

In preparation, I released new versions of mahotas (and the sister mahotas-imread) yesterday:

There are a few bugfixes and small improvements throughout.

*

If you like and use mahotas, please cite the software paper.

Mahotas-imread Now Accepts Options When Writing

This week, I committed to mahotas-imread, some code to allow for setting options when saving:

from imread import imsave
image = ...
imsave('file.jpeg', image, opts={ 'jpeg:quality': 95 })

This saves the image array to file file.jpeg with quality 95 (out of 100).

§

This is only available in the version from github (at the moment), but I will probably put up a new release soon.

(If you like and use mahotas, please cite the software paper.)

Working Around Bugs in Third Party Libraries

This is another story of continuous improvement, this time in mahotas’ little brother imread.

Last week, Volker Hilsenstein here at EMBL had a few problems with imread on Windows. This is one of those very hard issues: how to help someone on a different platform, especially one which you know nothing about?

In the end, the problem was not Windows per se, but an old version of libtiff. In that version, there is a logic error (literally, there is a condition which is miswritten and always false) and the code will attempt to read a TIFF header from a file even when writing. Mahotas-imread was not ready for this.

Many (especially in the research open-source world, unfortunately) would just say: well, I won’t support broken versions of libtiff: if your code does not adhere to the spec, I am just going to not work for you, if you don’t do exactly what you should, then I won’t work either. See this excellent old essay by Joel Spolsky on this sort of thing.

In my case, I prefer to work around the bug and when libtiff tries to read in write mode, return no data; which it correctly handles. I wrote the following data reading function to pass to libtiff:

tsize_t tiff_no_read(thandle_t, void*, tsize_t) {
        return 0;
}

The purpose of this code is simply to make imread work even on a broken, 5 year old version of a third party library.

§

In the meanwhile, we also fixed compilation in Cygwin as well as a code path which led to a hard crash.

Especially the possibility of a hard crash made me decide that this was important enough to merit a new release.