Article 1 of 3 in the series Python's super() explained
By Austin Bingham

Python's super() is one of those aspects of the language that many developers use without really understanding what it does or how it works. [1] To many people, super() is simply how you access your base-class's implementation of a method. And while this is true, it's far from the full story.

In this series I want to look at the mechanics and some of the theory behind super(). I want to show that, far from just letting you access your base-class, Python's super() is the key to some interesting and elegant design options that promote composability, separation of concerns, and long-term maintainability. In this first article I'll introduce a small set of classes, and in these classes we'll find a bit of a mystery. In subsequent articles we'll investigate this mystery by seeing how Python's super() really works, looking at topics like method resolution order, the C3 algorithm, and proxy objects.

In the end, you'll find that super() is both more complex than you probably expected, yet also surprisingly elegant and easy-to-understand. This improved understanding of super() will help you understand and appreciate Python on at a deeper level, and it will give you new tools for developing your own Python code.

A note on Python versions

This series is written using Python 3, and some of the examples and concepts don't apply completely to Python 2. In particular, this series assumes that classes are "new-style" classes. ((For a discussion of the difference between "old-style" and "new-style" classes, see the Python wiki.)) In Python 3 all classes are new-style, while in Python 2 you have to explicitly inherit from object to be a new-style class.

For example, where we might use the following Python 3 code in this series:

class IntList:
    . . .

the equivalent Python 2 code would be:

class IntList(object):
    . . .

Also, throughout this series we'll call super() with no arguments. This is only supported in Python 3, but it has an equivalent form in Python 2. In general, when you see a method like this:

class IntList:
    def add(self, x):
        super().add(x)

the equivalent Python 2 code has to pass the class name and self to super(), like this:

class IntList:
    def add(self, x):
        super(IntList, self).add(x)

If any other Python2/3 differences occur in the series, I'll be sure to point them out. [2]

The Mystery of the SortedIntList

To begin, we're going to define a small family of classes that implement some constrained list-like functionality. These classes form a diamond inheritance graph like this:

Inheritance graph

At the root of these classes is SimpleList:

class SimpleList:
    def __init__(self, items):
        self._items = list(items)

    def add(self, item):
        self._items.append(item)

    def __getitem__(self, index):
        return self._items[index]

    def sort(self):
        self._items.sort()

    def __len__(self):
        return len(self._items)

    def __repr__(self):
        return "{}({!r})".format(
            self.__class__.__name__,
            self._items)

SimpleList uses a standard list internally, and it provides a smaller, more limited API for interacting with the list data. This may not be a very realistic class from a practical point of view, but, as you'll see, it let's us explore some interesting aspects of inheritance relationships in Python.

Next let's create a subclass of SimpleList which keeps the list contents sorted. We'll call this class SortedList:

class SortedList(SimpleList):
    def __init__(self, items=()):
        super().__init__(items)
        self.sort()

    def add(self, item):
        super().add(item)
        self.sort()

The initializer calls SimpleList's initializer and then immediately uses SimpleList.sort() to sort the contents. SortedList also overrides the add method on SimpleList to ensure that the list always remains sorted.

In SortedList we already see a call to super(), and the intention of this code is pretty clear. In SortedList.add(), for example, super() is used to call SimpleList.add() - that is, deferring to the base-class - before sorting the list contents. There's nothing mysterious going on...yet.

Next let's define IntList, a SimpleList subclass which only allows integer elements. This list subclass prevents the insertion of non-integer elements, and it does so by using the isinstance() function:

class IntList(SimpleList):
    def __init__(self, items=()):
        for item in items: self._validate(item)
        super().__init__(items)

    @classmethod
    def _validate(cls, item):
        if not isinstance(item, int):
            raise TypeError(
                '{} only supports integer values.'.format(
                    cls.__name__))

    def add(self, item):
        self._validate(item)
        super().add(item)

You'll immediately notice that IntList is structurally similar to SortedList. It provides its own initializer and, like SortedList, overrides the add() method to perform extra checks. In this case, IntList calls its _validate() method on every item that goes into the list. _validate() uses isinstance() to check the type of the candidates, and if a candidate is not an instance of int, _validate() raises a TypeError.

Chances are, neither SortedList nor IntList are particularly surprising. They both use super() for the job that most people find most natural: calling base-class implementations. With that in mind, then, let's introduce one more class, SortedIntList, which inherits from both SortedList and IntList, and which enforces both constraints at once:

class SortedIntList(IntList, SortedList):
    pass

It doesn't look like much, does it? We've simply defined a new class and given it two base classes. In fact, we haven't added any new implementation code at all. But if we go to the REPL we can see that it works as we expect. The initializer sorts the input sequence:

>>> sil = SortedIntList([42, 23, 2])
>>> sil
SortedIntList([2, 23, 42])

but rejects non-integer values:

>>> SortedIntList([3, 2, '1'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sorted_int_list.py", line 43, in __init__
    for x in items: self._validate(x)
  File "sorted_int_list.py", line 49, in _validate
    raise TypeError(
                '{} only supports integer values.'.format(
                    cls.__name__))
TypeError: SortedIntList only supports integer values.

Likewise, add() maintains both the sorting and type constraints defined by the base classes:

>>> sil.add(-1234)
>>> sil
SortedIntList([-1234, 2, 23, 42])
>>> sil.add('the smallest uninteresting number')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sorted_int_list.py", line 45, in add
    self._validate(item)
  File "sorted_int_list.py", line 42, in _validate
    raise TypeError(
                '{} only supports integer values.'.format(
                    cls.__name__))
TypeError: SortedIntList only supports integer values.

You should spend some time playing with SortedIntList to convince yourself that it works as expected. You can get the code here.

It may not be immediately apparent how all of this works, though. After all, both IntList and SortedList define add(). How does Python know which add() to call? More importantly, since both the sorting and type constraints are being enforced by SortedIntList, how does Python seem to know to call both of them? This is the mystery that this series will unravel, and the answers to these questions have to do with the method resolution order we mentioned earlier, along with the details of how super() really works. So stay tuned!

[1]If you do know how it works, then congratulations! This series probably isn't for you. But believe me, lots of people don't.
[2]For a detailed look at the differences between the language versions, see Guido's list of differences.
Category: misc
Tags: Python