mahotas 1.2.4 released

Just released mahotas 1.2.4. The single improvement over 1.2.3 is that PIL (or Pillow) based IO is automatically used as a fallback. This will hopefully make the package easier to use for new users who have one fewer dependency to install.

(If you like and use mahotas, please cite the software paper.)

Mahotas 1.1.0 Released!

Mahotas 1.1 Released

released mahotas 1.1.0 yesterday.

Use pip install mahotas --upgrade to upgrade.

Mahotas is my computer vision library for Python.

Summary of Changes

It adds the functions resize_to and resize_rgb_to, which can be used like:

import mahotas as mh
lena = mh.demos.load('lena')
big = mh.resize.resize_rgb_to(lena, [1024, 1024])

As well as remove_regions_where, which is useful for handling labeled images:

import mahotas as mh
nuclear = mh.demos.load('nuclear')
nuclear = mh.gaussian_filter(nuclear, 2)
labeled,_ = mh.label(nuclear > nuclear.mean())

# Ok, now remove small regions:

sizes = mh.labeled.labeled_size(labeled)

labeled = mh.labeled.remove_regions_where(
        labeled, sizes < 100)

Moments computation can now be done in a normalized mode, which is robust against scale changes:

import mahotas as mh
lena = mh.demos.load('lena', as_grey=1)
print mh.features.moments.moments(lena, 1, 2, normalize=1)
print mh.features.moments.moments(lena[::2], 1, 2, normalize=1)
print mh.features.moments.moments(lena[::2,::3], 1, 2, normalize=1)

prints 126.609789161 126.618233592 126.640228523

You can even spell the keyword argument “normalise”!

print mh.features.moments.moments(lena[::2,::3], 1, 2, normalise=1)

This release also contains some bugfixes to SLIC superpixels and to convolutions of very small images.

(If you like and use mahotas, please cite the software paper.)

Story of Two Bugs

I while back, I got a bug report for mahotas-imread [1]:

PNG reads in inverted: This image loads in imread 0.3.1 incorrectly. It looks right if I xor it with 255, but I don’t think that’s all that’s wrong.

im16_as8

The first thing I noticed was that this was a 16 bit image. If you’ve been coding for as long as I have, you’ll immediately think: It’s a byte-endiness issue [2]. So, I tested:

imshow(im.byteswap())

and saw the following:

byteswap0

Not good. The I looked at the hint that the original poster provided and it did seem to be true: imshow(~f) worked reasonably well. My working hypothesis was thus that there is a flag whereby the PNG data needs to be interpreted after a bit reversion. I also noticed another thing, though:

max_16bit_value = 2**16-1
imshow(max_16bit_value - f)

Also looks decent.

The TIFF format does allow you to specify whether zero is supposed to be white or black. Maybe PNG has a similar “feature.”

I read through the libpng documentation (which is not great), a bit through its source, and through online descriptions of PNG format. Along the way, I noticed that converting the image to TIFF (with ImageMagick) and loading it with imread also gave the wrong result. Perhaps the TIFF reader had the same bug or ImageMagick [3].

Eventually, I realised that PNG files are in network order (i.e., in big-endian format) and the code did not convert them to little-endian. Thus, my initial intuition had been right: it was a byte-swapping error!

But in this case, why did imshow(f.byteswap()) result in a mangled image?

I stated to suspect that matplotlib had a bug. I tried to do:

imshow(f.byteswap() / 2.**16)

and it resulted in the correct image being shown.

As it turned out, matplotlib does not do the right thing when given 16 bit files. Thus, I had this figured out. I fixed imread and released version 0.3.2 and closed the bug report.

§

A single bug is often easy to debug, but when you have multiple bugs interacting; it is much more than twice as hard.

§

Hairy details: You may want to stop reading now.

Consider the following identities:

255 == 0xff
-f == (f ^ 0xff + 1)
2**16 - f = -f + 2**16 == -f (because of overflow)

Thus, it should not be surprising that flipping the bits or subtracting the image resulted int , in two-bit complement, ~f is roughly -f. Not exactly, but similarly enough that, by eye, it is hard to tell apart.

Finally, it all makes sense when you realise that matplotlib assumes that non-8 bit images are floating point and does:

final_image = (input_image * 255)
final_image = final_image.astype(np.uint8)

Because what is multiplying by 255? It’s the same as multiplying by -1! Thus, matplotlib would multiply by -1 and then take the low order bits. Thus, it showed a correct image if you pre-multiplied it by -1 (or flipped the bits) and gave it a byteswapped image!

[1] People don’t always appreciate how valuable good bug reports are. Seriously, they are a huge help: you are testing the software for me. Unfortunately, either shyness or past bad experiences will often cause people who see something worng to not report it.
[2] I now have over 15 years of experience coding (having had a relative late start [I didn’t care much about computers until I was close to college age], I’ve caught up.) If there is an area where I really feel that my experience shines through is debugging: I’ve seen enough mistakes and errors that my guesses as to what the bug is are more and more accurate (this is true even in code I have not seen).
[3] One of the reasons I started mahotas-imread was that I had not found a single library that could read the formats I wanted without a lot of bugs. So, I trust no one on this. In this case, the paranoia was unwarranted, as we’ll see.

Mahotas and the Python Scientific Ecosystem for Bioimage Analysis

This week, I’m in Barcelona to talk about mahotas at the EuBIAS 2013 Meeting.

You can get a preview of my talk here. The title is “Mahotas and the Python Scientific Ecosystem for Bioimage Analysis” and you’ll see it does not exclusively talks about mahotas, but the whole ecosystem. Comments are welcome (especially if they come in the next 24 hours).

In preparation, I released new versions of mahotas (and the sister mahotas-imread) yesterday:

There are a few bugfixes and small improvements throughout.

*

If you like and use mahotas, please cite the software paper.

Mahotas-imread Now Accepts Options When Writing

This week, I committed to mahotas-imread, some code to allow for setting options when saving:

from imread import imsave
image = ...
imsave('file.jpeg', image, opts={ 'jpeg:quality': 95 })

This saves the image array to file file.jpeg with quality 95 (out of 100).

§

This is only available in the version from github (at the moment), but I will probably put up a new release soon.

(If you like and use mahotas, please cite the software paper.)

Hard and Soft Documentation

A good poker player polarizes his hands. This means that, for example, they might play a check-raise (this means you first refrain from betting and then come over the top on your opponent—This is normally done to give the impression that you have a strong hand) when they do have a very strong hand or when they completely missed the flop (they have very bad cards and are just bluffing). Intermediate hands are played differently [1]

I think good software documentation is often also polarized: you should have hard documentation and soft documentation, but nothing in the middle.

Hard Documentation

This is low level documentation, generally. This is of the kind that a Unix manpage gets you. This tells you exactly what each function and each argument does. If it is good, it will often be very succint.

Mahotas has always excelled at this level. Here is the sobel edge function:

def sobel(img, just_filter=False):
    '''
    edges = sobel(img, just_filter=False)

    Compute edges using Sobel's algorithm

    `edges` is a binary image of edges computed according to Sobel's algorithm.

    This implementation is tuned to match MATLAB's implementation.

    Parameters
    ----------
    img : Any 2D-ndarray
    just_filter : boolean, optional
        If true, then return the result of filtering the image with the sobel
        filters, but do not threashold (default is False).    Returns
    -------
    edges : ndarray
        Binary image of edges, unless `just_filter`, in which case it will be
        an array of floating point values.
    '''

This is because I can remember the general ideas behind each function, but I might like to look up the exact arguments. So, every little detail is documented.

Soft Documentation

Soft documentation are tutorials and other higher level guides. They do not pertain to a single function or a single object, but to the overall structure and thinking behind the software.

Mahotas has not had so much of these, but I have been trying to add some over the past few months (Finding Wally, for example). Some more mahotas blogging might also help.

The Intermediate Level

I don’t care so much for intermediate level documentation. I rarely find that level helpful. Unfortunately, this is the level at which too much bad documentation is written. Stuff like:

This function is part of the image segmentation pipeline. It can be used after pre-filtering or directly on the raw image data.

Ok, sort of helpful, but not really.

[1] The reason for the randomness is that if you always do a single thing, people will catch on and exploit it (if you bluff a lot, people will call it; if you always have a strong hand, then you won’t get the added benefit of having someone try to call your bluff and give you even more chips). Intermediate hands should not be played like this because if the opponent pushes back, they probably have something that beats your intermediate hand. As always in poker, YMMV.

Working Around Bugs in Third Party Libraries

This is another story of continuous improvement, this time in mahotas’ little brother imread.

Last week, Volker Hilsenstein here at EMBL had a few problems with imread on Windows. This is one of those very hard issues: how to help someone on a different platform, especially one which you know nothing about?

In the end, the problem was not Windows per se, but an old version of libtiff. In that version, there is a logic error (literally, there is a condition which is miswritten and always false) and the code will attempt to read a TIFF header from a file even when writing. Mahotas-imread was not ready for this.

Many (especially in the research open-source world, unfortunately) would just say: well, I won’t support broken versions of libtiff: if your code does not adhere to the spec, I am just going to not work for you, if you don’t do exactly what you should, then I won’t work either. See this excellent old essay by Joel Spolsky on this sort of thing.

In my case, I prefer to work around the bug and when libtiff tries to read in write mode, return no data; which it correctly handles. I wrote the following data reading function to pass to libtiff:

tsize_t tiff_no_read(thandle_t, void*, tsize_t) {
        return 0;
}

The purpose of this code is simply to make imread work even on a broken, 5 year old version of a third party library.

§

In the meanwhile, we also fixed compilation in Cygwin as well as a code path which led to a hard crash.

Especially the possibility of a hard crash made me decide that this was important enough to merit a new release.

Extended Depth of Field In Python using Mahotas

I wanted to implement an extended depth of field algorithm last week for visualisation.

The idea of extended depth of focus is that you have a stack of images, taken at different focal points, and you build a single artificial image so that you get everything in focus. Some simple methods work OK, some very advanced methods work better. We were happy with OK for our setting. Here’s how to do it with mahotas:

Start with standard imports:

import numpy as np
import mahotas as mh

We are going to assume that you have an image object, which has three dimensions: the stack, height, and width:

stack,h,w = image.shape

We use mh.sobel as the measure of infocusness for each pixel [1]:

focus = np.array([mh.sobel(t, just_filter=True) for t in image])

Now, we select the best slice at each pixel location:

best = np.argmax(focus, 0)

So far, very easy. The next part is the hard part. We want to do the following:

r = np.zeros((h,w))-1
for y in xrange(h):
    for x in xrange(w):
        r[y,x] = image[best[y,x], y, x]

But this is very slow (never run nested loops in Python if you can avoid it). We get the same result with a slightly less legible, but faster manipulation [2]:

image = image.reshape((stack,-1)) # image is now (stack, nr_pixels)
image = image.transpose() # image is now (nr_pixels, stack)
r = image[np.arange(len(image)), best.ravel()] # Select the right pixel at each location
r = r.reshape((h,w)) # reshape to get final result

Et voilà!

§

Here is an example, from a stack of plankton images. This is is the maximum intensity projection:

Maximum Intensity Projection

This is the most in-focus slice (using the sobel operator):

Most in-focus slice

And this is the extended depth of field result:

Extended Depth of Field

It is clearly sharper (look at some of the organelles on the top-left: they are a blur in the other two images),at the expense of some possible noise. I actually played around with blurring the image a little bit and it did improve things ever so slightly.

§

Cite the mahotas paper if using this in a scientific publication.

[1] Other methods simply use a different measure here.
[2] I am not 100% convinced that this is the best. After all we create an array of size len(image) just to index. I would be happy to find an alternative.

Mahotas: A continuous improvement example

Last week, I tweeted:

Let me highlight a good example of continuous improvement in mahotas and the advantages of eating your own dog food.

I was using mahotas to compute some wavelets, but I couldn’t remember the possible parameter values. So I got some error:

[1]: mh.daubechies(im, 'db4')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-2e939df57e6a> in <module>()
----> 1 mh.daubechies(im, 'db4')

/home/luispedro/work/mahotas/mahotas/convolve.pyc in daubechies(f, code, inline)
    492     '''
    493     f = _wavelet_array(f, inline, 'daubechies')
--> 494     code = _daubechies_codes.index(code)
    495     _convolve.daubechies(f, code)
    496     _convolve.daubechies(f.T, code)ValueError: 'db4' is not in list

I could have just looked up the right code and moved on. Instead, I considered the unhelpfulness of the error message a bug and fixed it (here is the commit)

Now we get a better error message:

ValueError: mahotas.convolve: Known daubechies codes are ['D2', 'D4', 'D6',
'D8', 'D10', 'D12', 'D14', 'D16', 'D18', 'D20']. You passed in db4.

You still get an error, but at least it tells you what you should be doing.

Good software works well. Excellent software fails well too.