Why Python is Better than Matlab for Scientific Software

Why Python is Better than Matlab for Scientific Software

This is an argument I made at EuBIAS when arguing for the use of mahotas and the Python numpy-stack for bioimage informatics. I had a few conversations around this and decided to write a longer post summarizing the whole argument.

0. My argument mostly applies for new projects

If you have legacy code in MATLAB, then it may make sense to just continue using it. If it works, don’t fix it. However, if your Matlab code keeps causing you pain, Python might be a solution.

Note too that porting code is not the same as writing from scratch. You can often convert code from MATLAB to Python in a small fraction of the time it would take you to start from scratch.

1. Python has caught up with Matlab and is in the process of overtaking it.

This is my main argument: the momentum is in Python’s direction. Even two or three years ago, Python was behind. Now, it’s sailing past Matlab.

nr_lines_python

This graph shows the number of lines of code in some important projects for bioimage informatics/science in general (numpy, matplotlib, mahotas, skimage, and sklearn). As you can see, the base projects on the top (numpy and matplotlib) have been stable for some years, while the more applied packages at the bottom have exploded in recent years.

Depending on what you are doing, Python may even better support it. It is now, Matlab which is playing catch-up with open source software (for example, Matlab is now introducing their own versions of Dataframe, which Python has through Pandas [ itself, a version of R’s Dataframe object]).

The Python projects are also newer and tend, therefore, to be programmed in a more modern way: it is typical to find automated testing, excellent and complete documentation, a dedicated peer-reviewed publication, &c. This ain’t your grandfather’s open source with a dump on sourceforge and a single README file full of typos.

As an example of the large amount of activity going on in the Python world, just this week, Yhat released ggplot for Python [1]. So, while last week, I was still pointing to plotting as one of the weakneses of Python, it might no longer be true.

2. Python is a real programming language

Matlab is not, it is a linear algebra package. This means that if you need to add some non-numerical capabilities to your application, it gets hairy very fast.

For scientific purposes, when writing a small specialized script, Python may often be the second best choice: for linear algebra, Matlab may have nicer syntax; for statistics, R is probably nicer; for heavy regular expression usage, Perl (ugh) might still be nicer; if you want speed, Fortran or C(++) may be a better choice. To design a webpage; perhaps you want node.js. Python is not perfect for any of these, but is acceptable for all of them.

In every area, specialized languages are the best choice, but Python is the second best in more areas [2].

3. Python can easily interface with other languages

Python can interfact with any language which can be interacted through C, which is most languages. There is a missing link to some important Java software, but some work is being done to address that too. Technically, the same is true of Matlab.

However, the Python community and especially the scientific Python community has been extremely active in developing tools to make this as easy as you’d like (e.g., Cython).  Therefore, many tools/libraries in C-based languages are already wrapped in Python for you. I semi-routinely get sent little snippets of R code to do something. I will often just import rpy2 and use it from Python without having to change the rest of my code.

4. With Python, you can have a full open-source stack

This means that you are allowed to, for example, ship a whole virtual machine image with your paper. You can also see look at all of the code that your computation depends on. No black boxes.

5. Matlab Licensing issues are a pain. And expensive.

Note that I left the word expensive to the end, although in some contexts it may be crucial. Besides the basic MATLAB licenses, you will often need to buy a few more licenses for specific toolboxes. If you need to run the code on a cluster, often that will mean more licenses.

However, even when you do have the money, this does not make the problem go away: now you need admin for the licensing. When I was at CMU, we had campus-wide licenses and, yet, it took a while to configure the software on every new user’s computer (with some annoying minor issues like the fact that the username on your private computer needed to match the username you had on campus), you couldn’t run it outside the network (unless you set up a VPN, but this still means you need network access to run a piece of software), &c. Every so often, the license server would go down and stop everybody’s work. These secondary costs can be as large as the licensing costs.

Furthermore, using Python means you can more easily collaborate with people who don’t have access to Matlab. Even with a future version of yourself who decided to economize on Matlab licenses (or if the number of users shrinks and your institution decides to drop the campus licensing, you will not be one of the few groups now forced to buy it out-of-pocket).

By the way, if you do want support, there are plenty of options for purchasing it [3]: larger companies as well as individual consultants available in any city. The issue of support is orthogonal to the licensing of the software itself.

§

Python does have weaknesses, of course; but that’s for another post.

[1] Yhat is a commercial company releasing open-source software, by the way; for those keeping track at home.
[2] I remember people saying this about C++ back in the day.
[3] However, because science is a third-world economy, it may be easier to spend 10k on Matlab licenses which come with phone support to spending 1k on a local Python consultant.

42 thoughts on “Why Python is Better than Matlab for Scientific Software

  1. While I agree with most of your arguments (I use Python myself), I think your plot weakens your point as it is dead ugly. 😦
    Ever since starting to use matplotlib I’ve been disappointed by the default styling and looks which is not at all professional, so do us the favor and use at least plt.tight_layout() to position the subplots without overlapping labels. Some mpltools won’t hurt either… https://github.com/tonysyu/mpltools

    1. 🙂 Fair enough.

      I updated it to use mpltools.style.use(‘ggplot’).

      I agree that plotting is still a problem in Python, partly because of matplotlib’s insistence in copying the ugly Matlab colours.

    2. Any elaboration on “dead ugly” beyond the overlapping labels? I’ve recently come across a number of (mostly R) users who find matplotlib ugly, but it seems like they have a very different set of tastes than I do. Personally I find a lot of the output that comes from ggplot unprofessional because the lines are too thin and the text too small.

    3. Plotting options have also kept up with the times…
      Have you considered trying out PyQtGraph?
      Excellent surface and volumetric plots. Wayyy faster than matplotlib while dealing with both 3D, and large data sets.
      Shoutout to Luke Campagnola

  2. For my PhD I also used Matlab + Comsol heavily. In addition to your items there were three other topics that made a big problem for me.

    1) Scaling. You want to scale your app to Amazon EC2? no the license do no allow it. Recently, they enabled it, but for >4-digit $ costs.

    2) Future use. Academic licenses are rather cheap, but if you want to use it in industry everything is 4-5 times more expensive. So, I could not transfer my code base to new company, and hence I started with Python as well.

    3) Standalone software. Very difficult to create standalone software, for several operating systems, or god forbid, open sourcing and releasing your software. At the end, Matlab turns to a rather closed environment.

    Thanks for the post.

    1. I guess that you have changed the part of MATLAB with Python rather than COMSOL right? There is a still a big gap between COMSOL and its open alternatives.

      1. Correct. I replaced anything not dealing with Comsol with Python then, e.g. large-scale parts, but Comsol is bound with Matlab. In that time the alternatives to Comsol were Ansys and openfoam et al, that I could invest my poor life tweaking/learning/hacking them with low to small achievements. Later, I moved to a new field (oncology) that has nothing to do with Finite Element; life is too short to deal with them. In bioinformatics, the toolsets are smaller and manageable, an approach that is very agile; small pieces of software glued together developed in short increments thanks to possibilities Python and R provide.

  3. It is funny. I recently gave almost the same arguments in a talk to foster the use of Python for instrument control. Instead of Matlab, I said LabVIEW but the rest was basically the same.
    We do not have such a complete stack but we are making quick progress.

    Thanks for your post.

    1. For job interviews I was talking to people developing instruments, and apparently Windows is de facto and people want to only hear about /C/ derivatives or Java. Python/PyQT was not an option. I can imagine that nano second scale instrument queries would be difficult with Python compared to C++ routines, and hence still there is a rather long way there. I keep watch for Numba/PyPy until they get practically available. Until then C++/C#/Java it is.

      1. But Windows being de facto standard is a big trouble for many small companies. For example, when Microsoft stopped selling Windows XP licenses many companies were in trouble to distribute their systems and they had to go through a expensive and time consuming port to Windows Vista/7.
        Also, speed is not really very relevant when you do instrumentation. Talking to an instrument is a slow, IO bound task which is perfect for Python.
        And when you want “nano second scale” instruments queries, you typically use real time, dedicated hardware (FPGA, etc). You cannot trust Windows in time critical operation as you never now if it is going to start updating something!

    2. Hi Hernan,
      Could you please point me to your talk/slides. I would love to know about some of the potential Python based alternatives to LabVIEW and other instrument control software.

      Thanks.

  4. I agreee with the points mentioned in the original post. However, Python still needs to get past it’s selfmade obstacle: the compability horror of 2.x to 3.x versions. Personally, I was really sceptical as I delved into Python after working mainly with Java before. Some of the object-oriented concepts not only felt really ugly. The easier handling of not too big lsits, vectors, etc. had to be learned first, but is nice as always in higher-level languages (R drives this design element even further). Bigger Python-code can be confusing sometimes because the strict code sign limits appropiate design readability decisions in special cases, especially when growing big. Graphical capabilites are totally messed up for the actual, the modern versions of Python. Even when I follow exactly what the instructions are telling me, I don’t get a single (of the actual sparse) graphical libraries running. As long as it stays this way, Python is no from-beginning-to-the-end solutiuon for my scientific programs. It starts so good with it’s organization of data sets and easily implemented calculations but it simply looses touch when trying to present the results. At the moment, I am using Python for smaller scripts, Java for bigger projects and R for graphs, maps and, of course statistics, too.

    1. Florian, I have seen people encounter the problems you mention, specifically the “mess” associated with Python in comparison to something like MATLAB or, *cough*, IDL. The latter two deliver a lot of scientific computing power directly to their users in a convenient, easily managed package. With Python, if you don’t use Linux, or one of the packaged Python distributions like Canopy (commercial), Anaconda (commercial), or Python(x,y) (community) you are forced to navigate the Python ecosystem to find functionality you need. The way I think of this is that Python, and other open source languages (Java too), has much more “surface area”. The commercial languages manage that complexity but at the expense of reduced capabilities outside of their niche.

      For example, in Python, to accomplish something significant, you are forced to maintain a fairly complex environment of third party packages, properly track that environment and communicate it to others if you intend to share your work. The other languages accomplish more with their core libraries at the expense of capability outside their niche.

      As for the original post, this was nicely explained, good work!

      1. Thank you for mentioning Anaconda. I would just emphasize that Anaconda is more community than commercial. Anaconda Server and Anaconda add-ons are commercial but Anaconda itself is completely free and its package management tool, conda is 100% free, open source and community driven at http://conda.pydata.org

        One big purpose of Anaconda (and really conda at the core) is to make it easier for *everybody* to navigate the complexity of packages and dependencies.

  5. From a developer point of view, I’d say that arguments 0 and 2 are very correct (well, the other arguments too). It could be added to arg 0 the question of who your are developing for: if they use matlab and want a all-in-one, user-friendly toolbox, you might still want to go for matlab (sorry :-/).

    @Florian, this post is about python-vs-matlab, not Java. For this one, I think their are two issues to be discussed.
    1) python is an interpreted and highly dynamic language (everything can be changed at run time – even class inheritance!). The point is that it is really a different coding philosophy. This issue has been discussed a lot and it has been emphasized that it can be less adapted to large-scale collaborative projects than “constraint” language like Java. From what I know, the main answers are (unit) testing and AbstractBaseClass (ABC) – and good coding manners of course.
    2) python development environment can look messy. Being developed through (initially) little open-source project, you need to install a lot of external library to get a complete environment. More, for each need, their are usually several potential library to be chosen from. As Austin said, often your best choice (especially on non-linux OS) is to use a “python distribution”.
    Two sparse comments on this issue: the maintenance of libraries is done following the users needs and by the users. When you use one, you should expect to participate a little ; In some way, Python 2 and 3 can be considered as different languages: you develop for one of them. Now if you want to have both, then think it as an easier task than porting Python (2 or 3) code to Java or C++, for example.

    Finally I’d just like to conclude that, as arg 1 shows, we can expect things to get better.

  6. In my opinion Julia > Python > Matlab based on their merits as programming languages, but using the right tool / ecosystem for the right job always comes first.

  7. I came from Colombia and our universities don’t provide the licences for any kind of software for students, else the students must to use pirate licences, I take the desicion to write Python tutorials for my students and “teach” them Python, the tutorials are under Creative Commons, and have the next topics:

    Tutorial of Python 3
    Tutorial of ,matplotlib
    Tutorial of NumPy
    Tutorial of IPython

    My twitter is @nervencid

    Sorry for my english

  8. Nice post about using the Python scientific ecosystem as an alternative to Matlab. Although I agree with all of your comments in general, currently, if you are an electrical engineering student (I can only say from my perspective), life is much easy in the Matlab world. However, I think it is going to change (at least I strongly hope) as you too have mentioned in your post.

    With regards to Matplotlib, it is true that the default plots are mostly not “good-looking”. I guess that to some extent it was designed that way to allow the flexibility to the user (If I correctly understood what the late John Hunter explained in this tutorial http://www.youtube.com/watch?v=DNRJwENqEUY). It takes effort to make them presentable and sometimes it is really very frustrating when you got to do that for every plot. The biggest problem/weakness of matplotlib though is the 3D plotting capability. It just doesn’t work if the matrices a just a little too big (for example, it gives up if you are trying to plot a surface of 1000 by 1000 grid), which Matlab can handle quite comfortably. Currently I am using Mayavi for creating 3D plots, which isn’t bad, but I need to use two very different packages for my plots.

    In summary, I am very happy that I made the jump to Python, and I don’t regret. Python has its own set of advantages and disadvantages, but I hope with a little more time and community effort, things can only get better from here.

  9. Hi,
    I love python, but i work on big data and ……

    Spyder crash with direct import.
    Python is really very slowler than matlab….

    1. “Python is really very slowler than matlab….”

      How do you come to this result?
      I think this really depends on the programming skills… using Numpy, my experience shows, that algorithms run way faster in python. And if this is not enough use Numba and Cython. You can’t directly compare vanilla python with matlab.

    2. You shouldn’t even be using spyder for big data…. your code is slow because of your (in)ability to code.

  10. Nice article. I was searching for python memory management and use for high computational work.

    Luckily I used Matlab during Engineering for Digital image processing. Agreed totally, pain of licence is huge for individual users. Pirated copies go around. I never used Matlab after that. I feel, that’s more specific to processing of graphs, images, speech, etc.

    Python seems more matured for large distributed computing setups. Specially with big data activities. It’s ease of availability, open source, easy setup, easiness of development/execution like bash/shell script style (unlike java), simple syntax and defined intendations gives it head start.
    Still learning python to deeper level.

    Thanks.

  11. Tired of reading discussions about MATLAB and Python,Can somebody tell me that As an IT student which should I go for,though my programming skills are not that much good,know only few baics.What should I learn MATLAB or PYTHON for a better future.
    An Early reply made to this post will be highly appreciated.
    Thank You

  12. go with matlab if you want to learn deeper in any topics related to numerical computing. the reason is because just about any new algorithm that’s published in the literature whether in machine learning, statistics, linear algebra, signal processing, computational economics or whatever field, one can always find the authors code made available on the net. numerical computing is very difficult and reading someone’s papers’ is hard enough to wade through the pseudo codes & mathematical derivations if one wants to implement the algorithm from ground up. if one progams in matlab, then that’s unnecessary because there’s a high chance that the author’s already done the implementation in matlab and made their codes available. this means that the student/use will get to experiment fast with new algorithms if they know matlab by grabbing the authors’ code & explore them. if you’re python or R user, then you have to implement the codes from ground up (or port from matlab into python or R, etc,…) based on the author’s paper. why reinvent the code in python or R when the author’s already made their codes available in matlab? it saves time & also make learning fast because the user gets to try out new algorithms that are not available to him/her if he/she’s not a matlab programmer.

  13. 1. “Lines of Code” is “Lying with Statistics”.

    As you can clearly see, linux improvement directly correlates with the amount of cursing that goes on, ergo windows is obsolete.

    Python “MAY” better support it. Again shows a lack of an argument. Distro kernels “MAY” be compiled from a secondary code branch that leaks information to the NSA.

    “Modern” programming has no guarantee of being BETTER programming. In fact, “Modern” programming tends to care less about speed and resources and places emphasis on task completion. Python is a result of “Modern” programming where optimization is shunned and low level tweaking is impossible with the standard libraries.

    And what is MORE is that you’re using third party libraries while excluding that octave/matlab ALSO has thousands of third party libraries many of which also are open-source or free-licensing for non-commercial ventures.

    2. Is Octave/Matlab turing complete? Yep. Ergo Octave/Matlab is a “real language.”

    Python can easily compile itself, Octave/Matlab can easily compile Octave/Matlab. So long as you have an IO system and the ability to write any ASCII character, you can make a compiler.

    Octave/Matlab is designed to make heavy use of vectorized math and supplies multiple ways of optimizing and parallelizing (though matlab more) data processing.

    3. “Technically, the same is true of Matlab. BUT, I don’t like matlab ergo I’ll use weasel words to dismiss the capability.”

    4. “Python is open-source, and hence is better”, “Octave is open source matlab, ergo Octave is better”

    Few people actually dig into the source enough to know how the processing is really working. You may push crud patches, but it doesn’t mean squat as to knowing the low level code behind your code.

    Quite frankly, the Zen of Python is AGAINST people thinking about the low level code or any of that “distracting” stuff. With python3 going forth and fully removing ints in favor of a standardized infinite precision variable (which causes other problems while subsequently requiring deeper pointer chains to get to the data) saying “Oh, but you can know what it is doing” becomes meaningless.

    5. “I don’t believe in paying for software”
    Again, not an argument. Free does not mean better, OS X proves that very well. Free only means free and Octave is free.

  14. To note: My push with octave only is a result of your insistence that Python is better ONLY because it is open source. That is the sum of your argument; which means an open-source Matlab clone puts you in a very bad position because you have to make a REAL argument instead of saying “I like Python hence Matlab sucks” and using the tired open source shield to defend your claim.

  15. If you are doing natural sciences (physics, chemistry) or maths then don’t waste your time with Python; Matlab and Mathematica are less of a headache. They both have a friendly learning curve if you are a scientists or engineer without prior programming experience. Mostly, I prefer Matlab over Python because maths syntax is more natural, unlike Python where the maths seems gimmicky.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.