Article 2 of 3 in the series Python's super() explained
By Austin Bingham

In the previous article in this series we looked at a seemingly simple class graph with some surprising behavior. The central mystery was how a class with two bases can seem to invoke two different method implementations with just a single invocation of super(). In order to understand how that works, we need to delve into the details of how super() works, and this involves understanding some design details of the Python language itself.

Method Resolution Order

The first detail we need to understand is the notion of method resolution order or simply MRO. Put simply a method resolution order is the ordering of an inheritance graph for the purposes of deciding which implementation to use when a method is invoked on an object. Let's look at that definition a bit more closely.

First, we said that an MRO is an "ordering of an inheritance graph". Consider a simple diamond class structure like this:

>>> class A: pass
...
>>> class B(A): pass
...
>>> class C(A): pass
...
>>> class D(B, C): pass
...

The MRO for these classes could be, in principle, any ordering of the classes A, B, C, and D (and object, the ultimate base class of all classes in Python.) Python, of course, doesn't just pick the order randomly, and we'll cover how it picks the order in a later section. For now, let's examine the MROs for our classes using the mro() class method:

>>> A.mro()
[<class '__main__.A'>,
<class 'object'>]
>>> B.mro()
[<class '__main__.B'>,
<class '__main__.A'>,
<class 'object'>]
>>> C.mro()
[<class '__main__.C'>,
<class '__main__.A'>,
<class 'object'>]
>>> D.mro()
[<class '__main__.D'>,
<class '__main__.B'>,
<class '__main__.C'>,
<class '__main__.A'>,
<class 'object'>]

We can see that all of our classes have an MRO. But what is it used for? The second half of our definition said "for the purposes of deciding which implementation to use when a method is invoked on an object". What this means is that Python looks at a class's MRO when a method is invoked on an instance of that class. Starting at the head of the MRO, Python examines each class in order looking for the first one which implements the invoked method. That implementation is the one that gets used.

For example, let's augment our simple example with a method implemented in multiple locations:

>>> class A:
...     def foo(self):
...         print('A.foo')
...
>>> class B(A):
...     def foo(self):
...         print('B.foo')
...
>>> class C(A):
...     def foo(self):
...         print('C.foo')
...
>>> class D(B, C):
...     pass
...

What will happen if we invoke foo() on an instance of D? Remember that the MRO of D was [D, B, C, A, object]. Since the first class in that sequence to support foo() is B, we would expect to see "B.foo" printed, and indeed that is exactly what happens:

>>> D().foo()
B.foo

What if remove the implementation in B? We would expect to see "C.foo", which again is what happens:

>>> class A:
...     def foo(self):
...         print('A.foo')
...
>>> class B(A):
...     pass
...
>>> class C(A):
...     def foo(self):
...         print('C.foo')
...
>>> class D(B, C):
...     pass
...
>>> D().foo()
C.foo

To reiterate, method resolution order is nothing more than some ordering of the inheritance graph that Python uses to find method implementations. It's a relatively simple concept, but it's one that many developers understand only intuitively and partially. But how does Python calculate an MRO? We hinted earlier – and you probably suspected – that it's not just any random ordering, and in the next section we'll look at precisely how Python does this.

C3 superclass linearization

The short answer to the question of how Python determines MRO is "C3 superclass linearization", or simply C3. C3 is an algorithm initially developed for the Dylan programming language [1], and it has since been adopted by several prominent programming languages including Perl, Parrot, and of course Python. [2] We won't go into great detail on how C3 works, though there is plenty of information on the web that can tell you everything you need to know. [3]

What's important to know about C3 is that it guarantees three important features:

  1. Subclasses appear before base classes
  2. Base class declaration order is preserved
  3. For all classes in an inheritance graph, the relative orderings guaranteed by 1 and 2 are preserved at all points in the graph.

In other words, by rule 1, you will never see an MRO where a class is preceded by one of its base classes. If you have this:

>>> class Foo(Fred, Jim, Shiela):
...     pass
...

you will never see an MRO where Foo comes after Fred, Jim, or Shiela. This, again, is because Fred, Jim, and Shiela are all base classes of Foo, and C3 puts base classes after subclasses.

Likewise, by rule 2, you will never see an MRO where the base classes specified to the class keyword are in a different relative order than that definition. Given the same code above, this means that you will never see and MRO with Fred after either Jim or Shiela. Nor will you see an MRO with Jim after Shiela. This is because the base class declaration order is preserved by C3.

The third constraint guaranteed by C3 simply means that the relative orderings determined by one class in an inheritance graph – i.e. the ordering constraints based on one class's base class declarations – will not be violated in any MRO for any class in that graph.

C3 limits your inheritance options

One interesting side-effect of the use of C3 is that not all inheritance graphs are legal. It's possible to construct inheritance graphs which make it impossible to meet all of the constraints of C3. When this happens, Python raises an exception and prevents the creation of the invalid class:

>>> class A:
...     pass
...
>>> class B(A):
...     pass
...
>>> class C(A):
...     pass
...
>>> class D(B, A, C):
...     pass
...
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: Cannot create a consistent method resolution
order (MRO) for bases A, C

In this case, we've asked for D to inherit from B, A, and C, in that order. Unfortunately, C3 wants to enforce two incompatible constraints in this case:

  1. It wants to put C before A because A is a base class of C
  2. It wants to put A before C because of D's base class ordering

Since these are obviously mutually exclusive states, C3 rejects the inheritance graph and Python raises a TypeError.

That's about it, really. These rules provide a consistent, predictable basis for calculating MROs. Understanding C3, or even just knowing that it exists, is perhaps not important for day-to-day Python development, but it's an interesting tidbit for those interested in the details of language design.

Super proxies

The third detail we need to understand in order to resolve our mystery is the notion of a "super proxy". When you invoke super() in Python [4], what actually happens is that you construct an object of type super. In other words, super is a class, not a keyword or some other construct. You can see this in a REPL:

>>> s = super(C)
>>> type(s)
<class 'super'>
>>> dir(s)
['__class__', '__delattr__', '__dir__',
'__doc__', '__eq__', '__format__', '__ge__',
'__get__', '__getattribute__', '__gt__',
'__hash__', '__init__', '__le__', '__lt__',
'__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__self__',
'__self_class__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__thisclass__']

In most cases, super objects have two important pieces of information: an MRO and a class in that MRO. [5] When you use an invocation like this:

super(a_type, obj)

then the MRO is that of the type of obj, and the class within that MRO is a_type. [6]

Likewise, when you use an invocation like this:

super(type1, type2)

the MRO is that of type2 and the class within that MRO is type1. [7]

Given that, what exactly does super() do? It's hard to put it in a succinct, pithy, or memorable form, but here's my best attempt so far. Given a method resolution order and a class C in that MRO, super() gives you an object which resolves methods using only the part of the MRO which comes after C.

In other words, rather than resolving method invocation using the full MRO like normal, super uses only the tail of an MRO. In all other ways, though, method resolution is occurring exactly as it normally would.

For example, suppose I have an MRO like this:

[A, B, C, D, E, object]

and further suppose that I have a super object using this MRO and the class C in this MRO. In that case, the super instance would only resolve to methods implemented in D, E, or object (in that order.) In other words, a call like this:

super(C, A).foo()

would only resolve to an implementation of foo() in D or E. [8]

Why the name super-proxy

You might wonder why we've been using the name super-proxy when discussing super instances. The reason is that instances of super are designed to respond to any method name, resolving the actual method implementation based on their MRO and class configuration. In this way, super instances act as proxies for all objects. They simply pass arguments through to an underlying implementation.

The mystery is almost resolved!

We now know everything we need to know to resolve the mystery described in the first article in this series. You can (and probably should) see if you can figure it out for yourself at this point. By applying the concepts we discussed in this article - method resolution order, C3, and super proxies - you should be able to see how SortedIntList is able to enforce the constraints of IntList and SortedList even though it only makes a single call to super().

If you'd rather wait, though, the third article in this series will lay it all out for you. Stay tuned!

[1]The Dylan programming language
[2]Presumably starting with the letter "P" is not actually a requirement for using C3 in a language.
[3]Python's introduction of C3 in version 2.3 includes a great description. You can also track down the original research describing C3.
[4]With zero or more arguments. I'm using zero here to keep things simple.
[5]In the case of an unbound super object you don't have the MRO, but that's a detail we can ignore for this article.
[6]Remember that this form of super() requires that isinstance(obj, a_type) be true.
[7]Remember that this form of super() requires that issubclass(type2, type1) be true.
[8]Since object does not (yet...though I'm working on a PEP) implement foo().
Category: misc