Python “lists” are not lists. A history

Python “lists” are not lists. They are arrays.

§

This seems like the sort of rookie mistake that starting scientists make: they use terms imprecisely, they write like journalists not scientists.

I can totally imagine myself telling a student starting out: a list is a list and an array is an array, these are well defined computer science terms, you cannot use them interchangibly (and I can imagine their mental reaction: this guy is a pedantic prick [1]).

I tell the students to think like a politician: everything you write must still be defensible even taken out of context[2]

Calling an array a list is wrong.

§

I wondered whether they had been lists in the past, so I went back and checked the earliest version of Python available, which happens to be Python 1.0.1

After a minor bug fix [3], it actually compiled and ran on my Ubuntu 13.04 machine! It is somewhat crashy on errors, but otherwise works.

§

First find: already in Python 1.0.1, lists were arrays! This was already year old code base, so maybe at an even earlier time point, men were men and lists were lists. But by 1994, lists were arrays.

§

A surprising thing about the code: if you’re familiar with recent Python code, this will feel very familiar. Here is how you set a list member:

int
setlistitem(op, i, newitem)
    register object *op;
    register int i;
    register object *newitem;
{
    register object *olditem;
    if (!is_listobject(op)) {
        XDECREF(newitem);
        err_badcall();
        return -1;
    }
    if (i < 0 || i >= ((listobject *)op) -> ob_size) {
        XDECREF(newitem);
        err_setstr(IndexError, "list assignment index out of range");
        return -1;
    }
    olditem = ((listobject *)op) -> ob_item[i];
    ((listobject *)op) -> ob_item[i] = newitem;
    XDECREF(olditem);
    return 0;
}

This feels very similar to recent Python.

§

Even more surpring, here is the inner loop:

switch (opcode) {

case POP_TOP:
    v = POP();
    DECREF(v);
    break;

case ROT_TWO:
    v = POP();
    w = POP();
    PUSH(v);
    PUSH(w);
    break;

and so on… This is almost exactly the same as the most recent Python. We are still using fundamentally the same implementation of Python that was out 20 years ago.

Update: See the next post for a response to the notion that I just didn’t get what a list refers to.

[1] Student: I meant list as a general one-thing-after-another-in-order, Wise older researcher You mean a sequence Student: Yeah, that. Wise older researcher: then you should write sequence. At this point, the student attains enlightment and starts applying for jobs in industry.
[2] A common complaint I have heard after several MS thesis defenses is the jury member took this one sentence from my introduction out of context and made a big deal out of it. It wasn’t even that related to my work.
[3] There is a function getline() in the standard library which was not there in the early 1990s. So, Python’s use of an internal function with the same name gets the compiler confused. Renaming the internal function to python_getline fixes it.
Advertisements

10 thoughts on “Python “lists” are not lists. A history

  1. Pingback: I’m Doubling Down on a Mistake | Meta Rabbit

  2. Pingback: Python “lists” are not lists. A history | Enjoying The Moment

  3. Indeed, very few programming languages support lists as a basic data type or even one of the default library collections, because of their O(n) access and lack of memory locality, which are inefficient for various use cases.

    Of course, for immutable functional programming, lists are one of the most basic types (although arrays are also very important, for parallelism).

  4. I have to disagree. Perhaps because I don’t really understand what a list actually is.

    I’d say that the implementation of lists (in python and any other programming language) is irrelevant. A list is just a linear data structure with following operations, where B is a base type and S is the type list:

    nil : S
    cons : B x S –> S
    null : S –> Bool
    head : S –> B
    tail : S –> S

    List axioms:

    null(nil) = True
    null(cons(x, s)) = False
    head(cons(x, s)) = x
    tail(cons(x, s)) = s

    All of these properties are fulfilled in python’s list. List is dynamic data structure and can be implemented using e.g. dynamic array or linked list.

    • Indeed. The internal implementation is precisely that, is an implementation detail. The important part is the interface and the time-complexity of the different operations.

      Lists do not possess two properties commonly associated with “arrays” in computer science: memory locality and static size. So while indexing is O(1) for both Python lists and Python arrays, thay are not the same.

    • I consider the time and space complexity of operations to be part of the definition of what a list is. We don’t currently have programming languages to specify time complexity in the type system, but conceptually, pretend that we can.

      The implementation is a hidden detail, as long as it conforms to the intended time and space cost semantics.

  5. Python lists are so named because append is amortized O(1). Appending an item to an ordinary array is O(N). That is e.g. the case for array.array and numpy.ndarray, and arrays on other languages (Java, C, Fortran). If you compare with C++, a Python list behaves like std::vector but differently from an array. Python lists are nok linked lists, but they are still lists because of the amortized O(1) append. Python lists are intended to be used as lists, because they have the same time complexity as a linked list for the common operations of appending items to the back and popping items off the back.

    Because lists are not linked lists, they are inefficient as fifos. Therefore, Python has a different data structure for that purpose, which is collections.deque.

    A property of arrays is memory locality. Python lists are internally arrays of pointers (PyObject*), not arrays of objects (PyObject), and they therefore lack the memory locality normally assosicated with the data structure called “array” in computer science. That is the raison d’etre for array.array and numpy.ndarray, which store their data with memory locality.

    >>> [1.0, 2.0, 3.0] # no memory locality between items
    >>> numpy.array([1.0, 2.0, 3.0]) # memory locality between items

    It is incorrect to call Python lists arrays. They store an array of pointers as an internal implementation detail, but they are very different from arrays. Those guys who named the Python list knew computer science well enough to avoid calling them lists.

  6. Pingback: Year in Review: Blogposts | Meta Rabbit

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s