Building Machine Learning Systems with Python

I wrote a book. Well, only in part. Willi Richert and I wrote a book.

It is called Building Machine Learning Systems With Python and is now available from Amazon (or, although it has already been partially available directly from the publisher for a while (in a form where you get chapters as editing is finished).


The book is an introduction to using machine learning in Python.

We mostly rely on scikit-learn, which is the most complete package for machine learning in Python. I do prefer my own code for my own projects, but milk is not as complete. It has stuff that scikit-learn does not (and stuff they have, correctly, appropriated).

We try to cover all the major modes in machine learning and, in particular, have:

  1. classification
  2. regression
  3. clustering
  4. dimensionality reduction
  5. topic modeling

and also, towards the end, three more applied chapters:

  1. classification of music
  2. pattern recognition in images
  3. using jug for parallel processing (including in the cloud).


The approach is tutorial-like, without much math but lots of code examples.

This should get people started and will be more than enough if the problem is easy (and there are still many easy problems out there). With good features (which are problem-specific, anyway) knowing how to run an SVM will very often be enough.

Lest you fear we are giving people enough just enough knowledge to be dangerous, we stress correct evaluation of the results throughout the book. We warn repeatedly against mixing up your training and testing data. This simple principle is, unfortunately, still often disregarded in scientific publications. [1]


There is an aspect that I really enjoyed about this whole process:

Before starting the book, I had already submitted two papers, neither of which is out already (even though, after some revisions, they are in accepted state). In the meanwhile, the book has been written, edited (only a few minor issues are still pending) and people have been able to buy parts of it for a few months now.

I have now a renewed confidence in the choice to stay in science (also because I moved from a place where things are completely absurd to a place where the work very well). But the delay in publications that is common in the life sciences is an emotional drag. In some cases, the bulk of the work was finished a few years before the paper is finally out.

Update (July 26 2013): Amazon is now shipping the book! I changed the wording above to reflect this.

[1] It is rare to see somebody just report training accuracy and claim their algorithm does well. In fact, I have never seen it in a recent paper. However, performing feature selection or parameter tuning on the whole data prior to cross-validating on the selected features with the tuned parameters is pretty common still today (there are other sins of evaluation too: “we used multiple parameters and report the best”). This leads to inflated results all around. One of the problems is that, if you do things correctly in this environment, you risk that reviewers of your work will say “looks great, but so-and-so got better results” because so-and-so tuned on the testing set and seems to have “beaten” you. (Yes, I’ve had this happen, multiple times; but that is a rant for another day.)

28 thoughts on “Building Machine Learning Systems with Python

  1. We have a lot of data about user-activity on a website. We have a hard time however to draw conclusions from the data, apart from some obvious metrics.

    Could machine learning, and pattern recognition in particular, be a tool to extract new insights from data if you *don’t* know exactly what you’re looking for?

  2. Any other book you can recommend to learn and apply machine learning. I am a noob and doing CS degree,can you give me some suggestion so that I can learn and implement ML in real life?


    1. I believe that Bishops: Pattern Recognition and Machine Learning is one of the most used books during CS studies (although I might be wrong), it is rather math heavy.

  3. When will the book be available on amazon? Do you know yet? I’d love to get my hands on it as soon as possible!

      1. That sounds awesome. If you need any help, please let me know! I have been a full time python programmer for the past year and my master thesis was about natural language processing.

  4. How do I get the RAW version ? I click”Add to Cart” for ebook or Print, but that doesn’t work. Also I’m having major issues registering for an account on packtpub. Can you please advise, or I can just simply PayPal you some money for the chapters that are already written.

      1. Any update on this? I also bought the RAW version and would like to download the source code and data.

      2. I have also bought the raw version and am wondering where the code and datasets file can be downloaded. I would like to walk through the examples in the book but without the example data files it is quite difficult. Did you ever get an answer regarding how to get these files?

      3. Hi, Any update on this? I’ve bought book can’t wait to try the examples. However there appears to be no support files for the book on packtpub website. Where can i get them?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.