Python’s Weak Performance Matters

Here is an argument I used to make, but now disagree with:

Just to add another perspective, I find many “performance” problems in
the real world can often be attributed to factors other than the raw
speed of the CPython interpreter. Yes, I’d love it if the interpreter
were faster, but in my experience a lot of other things dominate. At
least they do provide low hanging fruit to attack first.

[…]

But there’s something else that’s very important to consider, which
rarely comes up in these discussions, and that’s the developer’s
productivity and programming experience.[…]

This is often undervalued, but shouldn’t be! Moore’s Law doesn’t apply
to humans, and you can’t effectively or cost efficiently scale up by
throwing more bodies at a project. Python is one of the best languages
(and ecosystems!) that make the development experience fun, high
quality, and very efficient.

(from Barry Warsaw)

I used to make this argument. Some of it is just a form of utilitarian programming: having a program that runs 1 minute faster but takes 50 extra hours to write is not worth it unless you run it >3000 times. For code that is written as part of data analysis, this is rarely the case. Now I think it is not as strong of an argument as I previously thought. Now I believe that the fact that CPython (the only widely used Python interpreter) is slow is a major disadvantage of the language and not just a small tradeoff for faster development time.

What changed in my reasoning?

First of all, I’m working on other problems. Whereas I used to do a lot of work that was very easy to map to numpy operations (which are fast as they use compiled code), now I write a lot of code which is not straight numerics. And, then, if I have to write it in standard Python, it is slow as molasses. I don’t mean slower in the sense of “wait a couple of seconds”, I mean “wait several hours instead of 2 minutes.”

At the same time, data keeps getting bigger and computers come with more and more cores (which Python cannot easily take advantage of), while single-core performance is only slowly getting better. Thus, Python is a worse and worse solution, performance-wise.

Other languages have also demonstrated that it is possible to get good performance with high-level code (using JIT or very aggressive compile-time optimizations). Looking from afar, the core Python development group seems uninterested in these ideas. They regularly pop-up in side projects: psyco, unladen swallow, stackless, shedskin, and pypy; the last one being the only one that is in active development; however, for all the buzz they generate they never make into CPython, which is still using the same basic bytecode stack-machine strategy that it used 20 years ago. Yes, optimizing a very dynamic language it’s not a trivial problem, but Javascript is at least as dynamic as Python is and it has several JIT-based implementations.

It is true that programmer time is more valuable than computer time, but waiting for results to finish computing is also a waste of my time (I suppose I could do something else in the meanwhile, but context switches are such a killer of my performance that I often just wait).

I have also sometimes found that, in order to make something fast in Python, I end up with complex code, almost unreadable, code. See this function for an example. The first time we wrote it, it was a loop based function, directly translating the formula it is computing. It took hours on a medium sized problem (it would take weeks on the real-life problems we want to tackle!). Now, it’s down to a few seconds, but unless you are much smarter than me, it’s not trivial to read out the underlying formula from the code.

The result is that I find myself doing more and more things in Haskell, which lets me write high-level code with decent performance (still slower than what I get if I go all the way down to C++, but with very good libraries). In fact, part of the reason that NGLess is written in Haskell and not Python is performance. I still use Jug (Python-based) to glue it all together, but it is calling Haskell code to do all the actual work.

I now sometimes prototype in Python, then do a kind of race: I start running the analysis on the main dataset, while at the same time reimplementing the whole thing in Haskell. Then, I start the Haskell version and try to make it finish before the Python-analysis completes. Many times, the Haskell version wins (even counting development time!).

Update: Here is a “fun” Python performance bug that I ran into the other day: deleting a set of 1 billion strings takes >12 hours. Obviously, this particular instance can be fixed, but this exactly the sort of thing that I would never have done a few years ago. A billion strings seemed like a lot back then, but now we regularly discuss multiple Terabytes of input data as “not a big deal”. This may not apply for your settings, but it does for mine.

Update 2: Based on a comment I made on hackernews, this is how I summarize my views:

The main motivation is to minimize total time, which is is TimeToWriteCode + TimeToRunCode.

Python has the lowest TimeToWriteCode, but very high TimeToRunCodeTimeToWriteCode is fixed as it is a human factor (after the initial learning curve, I am not getting that much smarter). However, as datasets grow and single-core performance does not get better TimeToRunCode keeps increasing, so that it is more and more worth it to spend more time writing code to decrease TimeToRunCode. C++ would give me the lowest TimeToRunCode, but at too high a cost in TimeToWriteCode (not so much the language, as the lack of decent libraries and package management). Haskell is (for me) a good tradeoff.

This is applicable to my work, where we do use large datasets as inputs. YMMV.

45 thoughts on “Python’s Weak Performance Matters

    1. I’ve used it. For the things it where it fits, it’s wonderful. Numpy-type stuff, in particular, is very nice. You get less benefit with other types of code, though.

      1. For example, here is a typical “fast in Haskell, slow in Python” use that I don’t think Cython would do a lot of good for:

        1. read a file of IDs (a few million IDs), this is your set of interest
        2. read another file (say a 1TB file), where each line has an ID + metadata. If the ID is in the set of interest, then perform some computation with the metadata.

        This is pretty trivial to write in Python, but has god-awful performance, and I don’t think that Cython is very helpful as it is still using all the dynamic stuff that make Python slow.

      2. Yes, you’re right.
        On the other hand you could also write some C functions for file reading and use them in Python through Cython (of course not as fast as pure C, due to overhead). That’s a question of how much effort you want to invest for this case, it would not be very useful if you never use them again..

  1. I think the real issue her is “the core Python development group seems uninterested in these ideas”. I know theres a multitude of compatibility issue that might prevent the core group from even begin the thought of a better runtime. But, I am still amazed that it’s like there is almost resistance from the core group in working with/on this.

    For me a natural path would be to replace the old runtime by PyPy, it’s even written in “Python” 🙂 why no one seams to be working on this is beyond me.

    Please note that I am also “Looking from afar”, but try to follow this issue as closely as possibly.

    1. I don’t want to get into this without knowing too much, but I have been disappointed by some of the comments that Guido has made wrt to performance efforts (for example, his repeated dismissal of Nuitka and the tone in which he does it have really changed my perception of him).

      That said, I would trust your judgement over mine if you say you follow this issue closely.

      1. I too fail to understand why Guido seams to almost to oppose looking into a better VM. Clearly he knows stuff that we don’t but VM performance would be a nice a theme for a version of Python (like list comprehensions, generators, async/await was at some point)

      2. Though I really think type annotation is a great thing, especially for public API, at least for now, there’s no sign that type annotation will be used to improve speed.

      3. I am much more worried about Guido’s remark that he hasn’t really tried out PyPy (Europython Keynote, 2016 I think). Maybe there was irony in his statement, that I didn’t pick up, but I remember him saying that he downloaded it once played with it for a minute. I just found it stunning, considering his role.

      4. I am also puzzled by Guido’s “lack of interest” in PyPy (and maybe in performance of the Python VM in general)

        PyPy seams like a perfect choice towards a better Python VM, a *LOT* of time and brainpower has been invested, it’s written in Python, its much more modern than CPython etc, it’s support by some brilliant people like Antonio Cuni, Maciej Fijalkowski, Armin Rigo and others…

        I have no doubt that Guido is a busy man and being the BDFL probably means watching out no to get caught up in anything time consuming and focus on reading PEP’s.

        But still a better performing VM should somewhere near the top of priorities.

    2. There is a lot of effort with project called Graal to bring huge performance profits for interpreted languages like Ruby and Python. It is quite near what Pypy does but is more general. It is very ambitious and numbers are great. You should definitely take some time to play with it. For reference I link wikipedia entry there are good general references about the project https://en.wikipedia.org/wiki/Graal_(compiler)

  2. Ever looked at PyPy for performance increasing? Python itself is not muched optimized for very large datasets but libraries like NumPy, SciPy & Pandas are specifically designed for larger data sets. Maybe when you get in regions of billions of things it’s worth to check out those libraries or ditch language entirely

    1. I have to confess that I haven’t tried it recently. The last time I tried it there were some Python 2/3 issues + it was not easy to install with conda so eventually I gave up.

      1. Another problem are all the comments going “have you tried X, Y, Z” (runtime, language, etc.) as solutions. They all point away from the standard python, which is not what I wan’t.

        I love Python and would like it to have a modern VM, so that all these other project were not necessary. Nuitka, Unladen Swallow, Gilectomy, PyPy, Pyston, stackless, etc. Imagine that all those hours and talent poured into “alternative” solutions were somehow coordinated into next generation standard Python runtime.

  3. I wrote python for 13 years (mostly twisted), then after a 5 year stint with node have recently moved to Golang. I have to say I find Golang has the same ‘spirit’ as python, it’s a minimal language with a few powerful primitives, it’s ‘wide and not deep’ as I used to say of python, and I find all of the Zen of Python (except the part about being Dutch) applies just as well to Golang.

      1. But with cython you can compile python to machine code… Add type info and “write cython” and i don’t think the differences should be too great. You can also use openmp for multiprocessing…

    1. I love Python(more than Go), but obviously Python is NOT a “minimal language”. There’s a difference between easy/hard, and simple/complicated. Python is (very)easy+complicated. And I mean complicated for the amount of things(about the language) programmers need to learn to write good enough code. Go is moderately easy+simple.

  4. I know it’s not as popular and doesn’t have the libraries that Python has, but F# is a great language for data science.

    It has an R type provider which helps bridge the gap somewhat.

    I port things I need but I’m not in the pure data science field.

  5. I enjoy using python, but I share your frustration when a single data processing step doesn’t follow an existing function from pandas/dask/whatever.
    I’ve been programming with Julia for a few years now, and this has opened up a whole new world for me. It feels like python with some sweet elements from matlab (and metaprogramming a la Haskell) but it runs like C. It’s still a bit (too) rough around some edges, but v1.0 should come out any week now. I believe it will become the go-to language for (data) scientist in the next decade (unless something even better comes along, of course 😉

  6. Python’s great but no means is it a one-size-fits-all solution. I’m glad you’ve found a good balance of ease-of-use and performance with Haskell.

  7. This is right. The argument that language slowness does not matter forgets what happens if the program actually succeeds. If you make a slow prototype, and it succeeds, if it’s the language itself that’s slow; then you need to port the program to something faster to get out of your local maximum.

    So, write it in a slow language if there is a good chance that it’s a disposable prototype. If you have already done a prototype (or otherwise well-known code) and will be keeping the code, then you are not designing for the case of success. I am constantly dealing with this issue with inefficient http code that stuffs whole requests and responses into memory; where the successful program is immediately applied to large files – where that strategy is doomed to fail. (Bad performance, as of running out of memory under non-trivial concurrency).

    So, I write Go with stdlib and make sure to properly stream data in those cases; as it’s at least asymptotically like the optimized code, so that there is a code-path to making it as fast as it can be, without a rewrite.

  8. I think part of the frustration also is that clearly Python does not need to be (so) slow, just look at ex. PyPy. (and several other faster implementations)

    If each line of Python executed was sent to a central AI, which analysed the code and send back a super optimised line, based on thousands of users knowledge. (or something like that), then you would understand and accept that things were a little slow and you could say “but, hey look at the optimisation….” 🙂

    But, Python is slow because it’s running on an old VM, I know it’s proven, stable, backwards compatible and a reference implementation, all that is very important. But I would still love to see the PythonVM arrive into this century… 🙂

  9. My opinion is that if your problem needs probably a more appropriate infrastructural-level solution than a single-machine solution. A single-machine solution will ends up only in a bottle-neck in the long run, while a networking one brings plenty of advantages even if in the beginning may have too much communication overhead.

  10. Many cpython improvements come from user reports, suggestions, and patches. After confirming that something seems anomalous with string set deletion, I opened a tracker issue: https://bugs.python.org/issue32846. Anyone who knows of a fix that does not break something else can suggest it there.

    Writing a replacement vm that runs all of Python and makes quantum leaps in speed is much harder than the current incremental improvements. Guido accepted Unladen Swallow in principle, but the project was cancelled when the hypotheses did not pan out.

  11. Pingback: Unsafe Space
  12. I just remembered your blog post, while I had some issues parsing a binary file with python. It wasn’t really performance, but rather, the Python APIs just didn’t seem sufficient/ergonomic for the task. So I thought, well, you played around often enough with Haskell, and if one thing Haskell does well, it’s neat APIs for parsing. After two hours, I was set up and had implemented a bunch of parsing logic that parsed one record. So I made a change to the code base that introduced one function (`many1`) so that the code parses a sequence of records. And all of the sudden, the program wouldn’t terminate any more. Typechecker okay and all.

    And then I got reminded that issues like this bite me every time I use haskell. The lazy evaluation just isn’t for me it seems.

    And I do miss Monads when I use any other language (other than C# which has them, kind of neat).

Leave a reply to ChrisRackauckas Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.