Good with Computers

The Sixty North Blog

A More Full-Featured Emacs company-mode Backend

In the first article in this series we looked at how to define the simplest company-mode backend.1 This backend drew completion candidates from a predefined list of options, and allowed you to do completion in buffers in fundamental mode. The main purpose of that article was to introduce the essential plumbing of a company-mode backend.

In this article we’ll expand upon the work of the first, adding some useful UI elements like annotations and metadata. We’ll also implement a rough form of fuzzy matching, wherein candidates will be presented to the user when they mostly match the prefix. After this article you’ll know almost everything you need to know about writing company-mode backends, and you’ll be in a great position to learn the rest on your own.

Most of what we’ll be doing in the article revolves around handling completion candidate “metadata”, data associated in some way with our completion candidates. In practice this kind of data covers things like documentation strings, function signatures, symbols types, and so forth, but for our purposes we’ll simply associate some biographical data with the names in our completion set sample-completions.

company-mode provides affordances for displaying metadata as part of the completion process. For example, if your backend is showing completions for function names, you could display the currently-selected function’s signature in the echo area. We’ll develop a backend that displays a sentence about the selected candidate in the echo area, and we’ll also display their initials as an annotation in the candidate selection popup menu.

Adding more data to our completion candidates

First we need to add some metadata to our existing completion candidates. To do this we’ll use Emacs text properties.2 For each completion candidate we define an :initials property containing their initials and a :summary property containing a one-sentence summary of the candidate.3 To add these properties, update sample-completions to look like this:

(defconst sample-completions
  '(#("alan" 0 1
      (:initials "AMT"
       :summary (concat "Alan Mathison Turing, OBE, FRS (/?tj??r??/ "
			"tewr-ing; 23 June 1912 ? 7 June 1954) was a "
			"British mathematician, logician, cryptanalyst, "
			"philosopher, pioneering computer scientist, "
			"mathematical biologist, and marathon and ultra "
			"distance runner.")))
    #("john" 0 1
      (:initials "JVN"
       :summary (concat "John von Neumann (/v?n ?n??m?n/; December 28, "
			"1903 ? February 8, 1957) was a Hungarian and "
			"American pure and applied mathematician, physicist, "
			"inventor and polymath.")))
    #("ada" 0 1
      (:initials "AAK"
       :summary (concat "Augusta Ada King, Countess of Lovelace (10 December "
			"1815 ? 27 November 1852), born Augusta Ada Byron "
			"and now commonly known as Ada Lovelace, was an "
			"English mathematician and writer chiefly known for "
			"her work on Charles Babbage's early mechanical "
			"general-purpose computer, the Analytical Engine.")))
    #("don" 0 1
      (:initials "DEK"
       :summary (concat "Donald Ervin Knuth (/k??nu??/[1] k?-nooth; born "
			"January 10, 1938) is an American computer "
			"scientist, mathematician, and Professor Emeritus "
			"at Stanford University.")))))

Attaching properties like this is a very convenient way to store metadata for completion candidates. Of course in a real backend you probably wouldn’t have a hard-coded list of candidates, and you’d be fetching them dynamically from a server, database, or external process. In that case, you’d need to also dynamically fetch the metadata you want and attach it to the candidate strings you serve through your backend. In the end, text properties work well in this context because they transparently transport the metadata – which company-mode doesn’t know about – with the completion strings that company-mode definitely knows about.

Adding completion menu annotations

This change by itself doesn’t really do anything, of course. All we’ve done is add properties to some strings, and we need to instruct company-mode on how to actually use them for display. The first way we’ll use this metadata, then, is to add a small annotation to each entry in the popup menu used for candidate selection. To add this annotation, we need to update company-sample-backend to respond to the annotation command. This command should resolve to the annotation you want to use for the given candidate. Typically this means calling a function taking the completion candidate string arg and returning the annotation string.

First let’s define a function that takes a completion candidate string and returns an annotation. Remember that our candidate strings store their metadata as text properties, so fundamentally this function simply needs to extract a property. For the annotation, we’ll extract the :initials property and return it (prefixed with a blank.) That function looks like this:

(defun sample-annotation (s)
  (format " [%s]" (get-text-property 0 :initials s)))

Next we need to update our backend to respond to the annotation command like this:

(defun company-sample-backend (command &optional arg &rest ignored)
  (interactive (list 'interactive))

  (case command
    (interactive (company-begin-backend 'company-sample-backend))
    (prefix (and (eq major-mode 'fundamental-mode)
                 (company-grab-symbol)))
    (candidates
     (remove-if-not
      (lambda (c) (string-prefix-p arg c))
      sample-completions))
    (annotation (sample-annotation arg))))

In the last line we tell the backend to call sample-annotation with the candidate string to produce an annotation.

Now when we do completion we see the candidates’ initials in the popup menu:
Screen Shot 2014-11-03 at 11.47.33 AM

Displaying metadata in the echo area

Where the annotation command adds a small annotation to the completion popup menu, the meta backend command produces text to display in the echo area.4 The process for producing the metadata string is almost exactly like that of producing the annotation string. First we write a function that extracts the string from the candidate text properties. Then we wire that function into the backend through the meta command.

As you’ve probably guessed, the function for extracting the metadata string will simply read the :summary property from a candidate string. It looks like this:

(defun sample-meta (s)
  (get-text-property 0 :summary s))

The changes to the backend look like this:

(defun company-sample-backend (command &optional arg &rest ignored)
  (interactive (list 'interactive))

  (case command
    (interactive (company-begin-backend 'company-sample-backend))
    (prefix (and (eq major-mode 'fundamental-mode)
                 (company-grab-symbol)))
    (candidates
     (remove-if-not
      (lambda (c) (string-prefix-p arg c))
      sample-completions))
    (annotation (sample-annotation arg))
    (meta (sample-meta arg))))

As before, in the last line we associate the meta command with our sample-meta function.

Here’s how the metadata looks when displayed in the echo area:
Screen Shot 2014-11-03 at 12.02.10 PM

Fuzzy matching

As a final improvement to our backend, let’s add support for fuzzy matching. This will let us do completion on prefixes which don’t exactly match a candidate, but which are close enough.5 For our purposes we’ll implement a very crude form of fuzzy matching wherein a prefix matches a candidate if the set of letters in the prefix is a subset of the set of letters in the candidate. The function for performing fuzzy matching looks like this:

(defun sample-fuzzy-match (prefix candidate)
  (cl-subsetp (string-to-list prefix)
              (string-to-list candidate)))

Now we just need to modify our backend a bit. First we need to modify our response to the candidates command to use our new fuzzy matcher. Then we need to respond to the no-cache command by returning true.6 Here’s how that looks:

(defun company-sample-backend (command &optional arg &rest ignored)
  (interactive (list 'interactive))

  (case command
    (interactive (company-begin-backend 'company-sample-backend))
    (prefix (and (eq major-mode 'fundamental-mode)
                 (company-grab-symbol)))
    (candidates
     (remove-if-not
      (lambda (c) (sample-fuzzy-match arg c))
      sample-completions))
    (annotation (sample-annotation arg))
    (meta (sample-meta arg))
    (no-cache 't)))

As you can see, we’ve replace string-prefix-p in the candidates response with sample-fuzzy-match, and we’ve added (no-cache 't).

Here’s how our fuzzy matching looks in action:
Screen Shot 2014-11-03 at 12.23.45 PM

That’s all, folks

We’ve seen how to use Emacs’ text properties to attach metadata to candidate strings. This is a really useful technique to use when developing company-mode backends, and one that you’ll see used in real-world backends. With that metadata in place, we’ve also seen that it’s very straightforward to tell your backend to display annotations in popup menus and metadata in the echo area. Once you’ve got the basic techniques under your belt, you can display anything you want as part of completion.

There are still more aspects to developing company-mode backends, but with what we’ve covered in this series you can get very far. More importantly, you know the main concepts and infrastructure for the backends, so you can learn the rest on your own. If you want to delve into all of the gory details, you’ll need to read the company-mode source code, and specifically the documentation for company-backends.7

For an example of a fairly full-featured backend implementation that’s currently in use (and under active development), you can see the emacs-ycmd project.8 Happy hacking!


  1. The first article in this series. 

  2. Text properties allow you to associate arbitrary data with strings. You can read about them here. Specifically, we use the special read syntax for text properties

  3. The summaries are simply the first sentences of the respective Wikipedia articles. 

  4. The Emacs manual entry on the echo area

  5. Fuzzy matching is commonly used for completion tools because it addresses common cases where users transpose characters, accidentally leave characters out, or consciously leverage fuzzy matching for increased speed.  

  6. The details of why this is the case are murky, but the company-mode source code specifically states this. The same source also says that we technically should be implementing a response to match, but that doesn’t seem to affect this implementation. 

  7. The company-mode project page

  8. The emacs-ycmd project is on github. In particular, see company-ycmd.el

Predictive Models of Development Teams and the Systems They Build

In 1968 Melvin Conway pointed out a seemingly inevitable symmetry between organisations and the software systems they construct. Organisations today are more fluid than 40 years ago, with short developer tenure, and frequent migration of individuals between projects and employers. In this article we’ll examine data on the tenure and productivity of programmers and use this to gain insight into codebases, by simulating their growth with simple stochastic models. From such models, we can make important predictions about the maintainability and long-term viability of software systems, with implications for how we approach software design, documentation and how we assemble teams.

Legacy systems

I’ve always been interested in legacy software systems, primarily because legacy software systems are those which have proven to be valuable over time. The reason they become legacy – and get old – is because they continue to be useful.

I have an interest in that as a software engineer, having worked on legacy systems as an employee for various software product companies, and more recently as a consultant with Sixty North, helping out with the problems that such systems inevitably raise.

I called myself a “software engineer”, although I use the term somewhat loosely. To call what many developers do “engineering” is a bit of a stretch. Engineer or not, my academic training was as a scientist, which is perhaps reflected in the content of this article. Most readers will be familar with the structure of the scientific method: We ask questions. We formulate hypotheses which propose answers to those questions. We design experiments to test our hypotheses. We collect data from the experiments. And we draw conclusions from the data. This done, having learned something about how our world works, we go round again.

I would like to be able to apply this powerful tool to what we do as “software engineers” or developers. Unfortunately for our industry, it’s very difficult to do experimental science – still less randomised controlled trials – on the process of software development, for a whole host of reasons: Developers don’t like to be watched. We can’t eliminate extraneous factors. The toy problems we use in experiments aren’t realistic. No two projects are the same. The subjects are often students who have little experience.
experiment

Even on the rare occasions we do perform experiments, there are many threats to validity of such experiments, so the results tend not be to taken very seriously. Addressing the weaknesses of the experimental design would be prohibitively expensive, if possible at all.

The role of models

Fortunately, there’s another way of doing science, which doesn’t rely on the version of the scientific method just mentioned. It’s the same type of science we do in astronomy, or geology where we can’t run experiments because we don’t have enough time, we don’t have enough money, or we just don’t have anywhere big enough to perform the experiment. Experimentally colliding galaxies, or experimenting with the initiation of plate tectonics are simply in the realms of science fiction on the money, time and space axes.

In such cases, we have to switch to a slightly different version of the scientific method, which looks like this: We make a prediction about how the universe works, where our ‘universe’ could be galactic collisions, or the more prosaic world of software development. We then make a model of that situation either through physical analogy or in a computer. By executing this model we can predict the outcome based on the details of a specific scenario. Lastly, we compare the results from the model with reality and either reject the model completely if it just doesn’t work, or tune the model by refining, updating or tweaking it, until we have a model that is a good match for reality.

The aim here, is to come up with a useful model. Models are by their very nature simplifications or abstractions of reality. So long as we bear this in mind, even a simple (although not simplistic) model can have predictive power.

Essentially, all models are wrong, but some are useful

George E. P. Box1

A model of software development

A key factor in modelling the development of legacy software systems is the fact that although such systems may endure for decades – we developers tend not to endure them for decades. In other words, the tenure of software developers is typically much shorter than the life span of software systems.

But how much shorter?

Whenever I speak publically on this topic with an audience of developers, I like to perform a simple experiment with my audience. My assumption is that the turnover of developers can be modelled as if developers have a half-life within organizations. The related concept of residence time2 is probably a better approach, but most people have a grasp of half-life, and it avoids a tedious digression into explaining something that is ultimately tangiential to the main discussion. In any case, a catchy hook is important when you’re going for audience participation, so half-life it is.

I start by asking everyone in the audience who has moved from working on one codebase – for example a product – to another (including the transition to working on their first codebase), at any time in the preceding 32 years to raise their hands. This starting point is intended to catch the vast majority of typical tech conference audience members, and indeed it does, although arguably in the spirit of inclusiveness I should start with 64 years. Now all of the audience have raised hands.

Next I ask the audience to keep their hands raised if this is still true for 16 years: Most of the hands are still raised. Now eight years: Some hands are going down. Now four years: A small majority of hands are still raised. Then two years: At this point, a large minority still have raised hands, but we have crossed the half-way threshold and established that the ‘half-life’ of developers is somewhere between two and four years. This result has been consistent on over 90% of the occasions I’ve performed this experiment. The only notable deviation was a large Swedish software consultancy where we established the half-life was around six months!

In fact, what little research there has been into developer tenure indicates that the half-life across the industry is about 3.2 years, which fits nicely with what I see out in the field.

One way to think about this result concretely is as follows: If you work on a team that numbers ten developers in total, you can expect half of them – five in this case – to leave at some point in the next 3.2 years. Obviously, if the team size is to remain constant, they will need to be replaced.

Note that saying that turnover of half of a population of developers will take 3.2 years is not the same as claiming that the average tenure of a developer is 3.2 years. In fact, mean tenure will be \(3.2 / \ln 2\) which is about 4.6 years. You might want to compare that figure against your own career so far.

If you’re concerned that developers don’t behave very much like radionucleides then rest assured that the notion of half-life follows directly from an assumption that the decay of a particle (or departure of a developer) follows exponential decay, which again follows from the notion of constant probability density with respect to time. All we’re saying is that in a given time interval there is a fixed probability that are particle will decay (or a developer will depart), so it is actually a very simple model.

Notably, the half-life of developers is shorter than the half-life of almost anything else in our industry, including CEOs, lines of code, megacorps or classes.

Half-lives in years of various entities in and around the software industry. Developers are one of the most short-lived.

Half-lives in years of various entities in and around the software industry. Developers are one of the most short-lived.

Productivity

If we’re going to have a stab a modelling software developers as part of the software development process, we’re going to need some measure of productivity. I’m going to use – and you can save your outrage for later – lines of code. To repurpose a famous phrase by Winston Churchill: “Lines of code is the worst programmer productivty metric, except for all the others”. Now I know, as well as you do, that what ultimately matters to the customers of software systems is value for money, return on investment, and all those other good things. The problem is, that it’s notoriously hard to tie any of those things back in a rigourous way to what individual developers do on a day-to-day basis, which should be design and code software systems based on an understanding of the problem at hand. On the other hand, I think I’m on fairly safe ground in assuming that software systems with zero lines of code deliver no value, and proportionally more complex problems can be solved (and hence more value delivered) by larger software systems.

Furthermore, there’s some evidence that the number of lines of code cut by a particular developer per day is fairly constant irrespective of which programming language they’re working in. So five lines of F# might do twice as much ‘work’ as 10 lines of Python or 20 lines of C++. This is an alternative phrasing of the notion of ‘expressiveness’ in programming languages. This is why we tend to feel that expressiveness – or semantic density – is important in programming languages. We can often deliver as much value with 5 lines of F# as with 20 lines of C++, yet it will take a quarter of the effort to put together.

Now, however you choose to measure productivity, not all developers are equally productive on the same code base, and the same developer will demonstrate different productivity on different code bases, even if they are in the same programming language. In fact, as I’m sure many of us have experienced, the principle control on our productivity is simply the size of the code base at hand. Writing a hundred lines of code for a small script is usually much less work than adding 100 lines to a one million line system.

We can capture this variance by looking to what little literature there is on the topic3, and using this albeit sparse data to build some simple developer productivity distributions.

For example, we know that on a small 10 000 line code base, the least productive developer will produce about 2000 lines of debugged and working code in a year, the most productive developer will produce about 29 000 lines of code in a year, and the typical (or average) developer will produce about 3200 lines of code in a year. Notice that the distribution is highly skewed toward the low productivity end, and the multiple between the typical and most productive developers corresponds to the fabled 10x programmer.

Given only these three numbers and in the absence of any more information on the shape of the distribution, we’ll follow a well-trodden path and use them to erect a triangular probability density function (PDF) characterised by the minimum, modal and maximum productivity values. Based on this PDF it’s straightforward to compute the corresponding cumulative distribution function (CDF) which we can use to construct simulated “teams” of developers, by using the CDF to transform uniformly distributed samples on the cumulative probability axis into samples on the producivity axis. In a real simulation where we wanted to generate many typical teams, we would generate uniform random numbers between zero and one and transform them into productivity values using the CDF, although for clarity in the illustration that follows, I’ve used evenly distributed samples from which to generate the productivity values.

Programmer productivity distribution

Programmer productivity in lines of code per year for a team of ten developers on a 10000 line project.

As you can see the resulting productivity values for a team of ten developers cluster around the modal productity value, with comparitavely few developers of very high productivity.

Perhaps more intuitively, software development of teams comprising ten developers look like this:

A typical team of ten developers would look like this, if their contributions in lines of code were represented as circular areas.

A typical team of ten developers would look like this, if their contributions in lines of code were represented as circular areas.

This typical team has a only a couple of people being responsible for the majority of the output. Again, it might be interesting to compare this to your own situation. At the very least, it shows how the ‘right’ team of two developers can be competitive with a much larger team; a phenomenon you may have witnessed for yourselves.

Overall, this team produces about 90 000 lines of code in a year.

Incorporating growth of software

Of course, the story doesn’t end there. Once our team has written 90 000 lines of code, they’re no longer working on a 10 000 line code base, they’re working on a 100 000 line code base! This causes their productivity to drop, so we now have a modified description of their productivities and a new distribution from which to draw a productivity if somebody new joins the team. But more of that in a moment. We don’t have much in the way of published data for productivity on different sizes of code base, but we can interpolate between and extrapolate from the data we do have, without any of the assumptions involved in such extrapolation looking too outlandish. As you can see, we can put three straight line through the minimums, modes and maximums respectively to facilitate determination of a productivity distribution for any code size up to about 10 million lines of code. (Note that we shouldn’t infer from these straight lines anything about the mathematical nature of how productivity declines with increasing code size – there be dragons!4)

Productivity is a function of codebase size. Developers are dramatically less productive on larger bodies of code.

Productivity is a function of codebase size. Developers are dramatically less productive on larger bodies of code.

When performing a simulation of growth of software in the computer, we can get more accurate results by reducing the time-step on which we adjust programmer productivity downwards from once per year as in the example above, to just once per day: At the end of every simulated day, we know how much code we have, so we can predict the productivity of our developers on the following day, and so on.

Incorporating turnover

We’ve already stated our assumption that the probability of a developer departing is constant per unit time, together with our half-life figure of 3.2 years. Given this, it’s straightforward to compute the probability of a developer leaving on any given day, which is about 0.001, or once in every thousand days. As we all know, when a particular developer leaves an organisation and is replaced by a new recruit, there’s no guarantee that their replacement will have the same level of productivity. In this event, our simulation will draw a new developer at random from the distribution of developer productivity for the current code base size, so it’s likely that a very low productivity developer will be replaced with a higher productivity developer and that a very high productivity developer will be replaced with a lower productivity developer; an example of regression to mediocrity.5

Simulating a project

With components of variance in developer productivity, its relationship to code base size and a simple model of developer turnover we’re ready to run a simulation of a project. To do so, we initialize the model with the number of developers in the development team, and set it running. The simulator starts by randomly drawing a team of developers of the required size from the productivity distribution for a zero-size code base, and computes how much code they will have produced after one day. At the end of the time step, the developer productivities are updated to the new distribution; each developer’s quantile within the distribution remains fixed, but the underlying CDF is updated to yield a new productivity value. The next time step for day two then begins, with each developer producing a little less code than on the previous day.

On each day, there is a fixed probability that a developer will leave the team. When this occurs, they are immediately replaced the following day by a new hire whose productivity will be drawn anew from the productivity distribution. For small teams, this case shift the overall team productivity significantly and more often than not towards the mean.

Let’s look at an example: If we configure a simulation with a team of seven developers, and let it run for five years, we get something like this:

Streamed code contributions of a team of seven developers over five years. A total of 19 people contribute to this codebase.

Streamed code contributions of a team of seven developers over five years. A total of 19 people contribute to this codebase.

This figure has time running from left to right, and the coloured streams show the growing contributions over time of individual developers. We start on the left with no code and the original seven developers in the team, from top to bottom sporting the colours brown, orange, green, red, blue, purple and yellow. The code base grows quickly at first, but soon slows. About 180 days into the project the purple developer quits, indicated by a black, vertical bar across their stream. From this point on, their contribution remains static and is shown in a lighter shade. Vertically below this terminator we see a new stream beginning, coloured pink, which represents the contribution of the recruit who is purple’s replacement. As you can see, they are about three times more productive (measured in lines of code at least), than their predecessor, although pink only sticks around for around 200 days before moving on and being replaced by the upper blue stream.

In this particular scenario, at the end of the five year period, we find our team of seven has churned through a grand total of 19 developers. In fact the majority of the extant code was written by people no longer with the organisation; only 37% of the code was programmed by people still present at the end. This is perhaps motivation for getting documentation in place as systems are developed, while the people who are doing the development are still around, rather than at the end of the effort – if at all – as is all to common.

Monte Carlo simulation

Being drawn randomly, each scenario such as the one outlines above is different, although in aggregate they vary in a predictable way according to the distributions we are using. The above scenario was typical, insofar as it produced, compared to all identically configured simulations, an average amount of code, although it did happen to get through a rather high number of developers. Of course, individual scenarios such as this, although interesting, can never be indicative of what will actually happen. For that, we need to turn to Monte Carlo modelling: Run many thousands of simulations – all with configurations drawn randomly from identical distributions – and look at the results in aggregate either graphically or using various statistical tools.

When we run 1000 simulations of a seven person project run over three years, the following statistics emerge: We can expect our team of seven to see four people leave and be replaced during the project. In fact, the total number of contributors will be 11 ± 2 at one standard deviation (1σ). The total body of code produced in three years will be 157 000 ± 23 000 @ 1σ. The proportion of the code written by contributors present at the end will be 70% ± 14% @ 1σ.

Perhaps a more useful question might be to ask “How long is it likely to take to produce 100 000 lines of code?” By answering this question for each simulation, we can build a histogram (actually we use a kernel density estimate here, to give a smooth, rather than binned, result).

How long does it take a team of seven to deliver one-hundred thousand lines of code?

How long does it take a team of seven to deliver one-hundred thousand lines of code?

Although this gives a good intuitive sense of when the team will reach the 100 k threshold, a more useful chart is the cumulative distribution of finishing time, which allows us to easily recognise that while there is a probability of 20% of finishing in 330 days, for a much more secure 80% probability, we should allow for 470 days – some 42% longer and correspondingly more costly.

Cumulative distribution function showing probability of delivery of one-hundred thousand lines of code before a particular day. Based on 10 000 simulations.

Cumulative distribution function showing probability of delivery of one-hundred thousand lines of code before a particular day. Based on 10 000 simulations.

Finally, looking at the proportion of the code base that was, at any time, written by the current team, we see an exponential decline in this fraction, leaving us with a headline figure of 20% after 20 years.

The proportion of code written by the current team. In other words, for how much of your code can you easily talk to the author?  Blue and green lines show plus and minus one standard deviation around the mean. Based on 10 000 simulations.

The proportion of code written by the current team. In other words, for how much of your code can you easily talk to the author? Blue and green lines show plus and minus one standard deviation around the mean. Based on 10 000 simulations.

That’s right, on a 20 year old code base only one fifth of the code will have been created by the current team. This resonates with my own experience, and quantitatively explains why working on large legacy systems can be a lonely, disorienting and confusing experience.

A refinement of Conway’s Law?

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure

Melvin Conway6

This remark, which has become known as Conway’s Law, was later interpreted, a little more playfully, by Eric Raymond as “If you have four groups working on a compiler, you’ll get a 4-pass compiler”. My own experience is that Conway’s Law rings true, and I’ve often said that “Conway’s Law is the one thing we know about software engineering that will still be true 1000 years from now”.

However, over the long term development efforts which lead to large, legacy sofware systems the structure and organisation of the system isn’t necessarily congruent with the organisation at present. After all, we all know that reorganisations of people are all too frequent compared to major reorganisation of software! Rather, the state of a system reflects not only the organisation, but the organisational history and the flow of people through those organisations over the long term. What I mean is that the structure of the software reflects the organisational structure integrated over time.

Simulations such as those presented in this article allow to to get a sense of how large software systems as we see them today are the fossilised footprints of developers past. Perhaps we can use this improved, and quantitative, understanding to improve planning, costing and ongoing guidance of large software projects.


  1. Box, G. E. P., and Draper, N. R., (1987), Empirical Model Building and Response Surfaces, John Wiley & Sons, New York, NY. 

  2. Residence time article on Wikipedia 

  3. COCOMO II COnstructive COst MOdel II 

  4. Contrary to popular conception, straight lines on log-log plots don’t necessarily indicate power-law relationships. See the excellent So You Think You Have a Power Law — Well Isn’t That Special? 

  5. Francis Galton (1886). “Regression towards mediocrity in hereditary stature”. The Journal of the Anthropological Institute of Great Britain and Ireland (The Journal of the Anthropological Institute of Great Britain and Ireland, Vol. 15) 15: 246–263. 

  6. Melvin Conway (1968) How do Committees Invent? 

Writing the Simplest Emacs company-mode Backend

In Emacs, company-mode (short for “complete anything”) is a framework for performing completion in buffers.1 It’s an alternative to the popular auto-complete-mode. company-mode supports extension via backends which provide the framework with lists of possible completions in various contexts. So, for example, there’s a backend that provides completion support for Emacs lisp and one that does the same for Python. Backends can use very different technologies as long as they conform to the backend interface specified by the mode.

I recently decided to write a company-mode backend for ycmd, a completion server for languages including C/C++/Objective-C and Python.2 All in all it was a relatively pain-free experience, but the process isn’t as well documented as I would have liked. So I want to use this series to describe how it’s done with the hope of making it easier for others and of helping me remember how to do it in the future.

I won’t be covering all of the details of company-mode backends (partially because I don’t know them all), but this series should tell you what you need to know to create your own fully-armed and operational backend.3 In this article we’ll define the simplest possible backend in order to familiarize you with the concepts and infrastructure involved. In the next article we’ll add some sophistication to that backend to improve the user experience.

The simplest possible backend

For our example we need to define a source of completion candidates. Ultimately, any completion source is just a sequence of strings that meet some criteria. Examples might include:

  • A list of English words starting with some prefix
  • Methods for a particular object in Java
  • Modules available for import in Python program

company-mode doesn’t care about the nature of these strings. It just takes them and makes it easy for the user to select from the available options.

In this case, we’ll just define a fixed list of strings:

(defconst sample-completions
    '("alan" "john" "ada" "don"))

That’s it.4 Completion sources don’t need to (though they generally will) be more complex than that.

Defining the backend

Backends take the form of a function which takes a command as its first argument. This command can take any of a number of values, and backends are required to respond to a handful of them. Before we get into those details, let’s look at our very basic backend implementation:

 1 (require 'cl-lib)
 2 (require 'company)
 3 
 4 (defun company-sample-backend (command &optional arg &rest ignored)
 5   (interactive (list 'interactive))
 6 
 7   (cl-case command
 8     (interactive (company-begin-backend 'company-sample-backend))
 9     (prefix (and (eq major-mode 'fundamental-mode)
10                  (company-grab-symbol)))
11     (candidates
12      (cl-remove-if-not
13       (lambda (c) (string-prefix-p arg c))
14       sample-completions))))

The signature of this function is mandated by company-mode. Line 5 makes the function interactive so that you can easily drive your backend without invoking company-mode, something we’ll do in a bit. The cl-case statement on line 7 is where we decide what to do based on command. In this case, we respond to interactive, prefix, and candidates.

The interactive command is passed once, before the other commands are used, and it is used to initialize the company-mode infrastructure. All you need to do as a backend developer is pass your backend to company-begin-backend as in this example.

The prefix command

The prefix command is probably the most complex command to handle. This command should return the text that is to be completed. Determining this text can be complex depending on what you’re trying to complete, but company-grab-symbol often does “the right thing” if your completion context is space-delimited.

If the prefix command returns nil, this tells company-mode that the backend is not suitable for doing completion on this context. On line 9 of our example we check to see if we’re in fundamental-mode and, if not, return nil. In other words, we’re saying here that our backend only applies to fundamental-mode. Programming language-oriented backends can make a similar check for their specific modes. When a backend responds to prefix with nil, other backends are given a chance to do the completion.

On the other hand, if a backend is appropriate for the current completion but it can’t provide any completions for some reason, the backend should return 'stop. This tells company-mode that no other backends should be used for this completion.

So our backend is effectively saying that it can do completion for anything in fundamental mode. There are more details to prefix, but that’s covers the important parts.

The candidates commands

The response to the candidates command is where you actually generate a list of possible completions at a point in a buffer. When this command is passed in, the arg argument holds the value returned by prefix. In other words, you construct your candidates based on the text that you previously indicated was to be completed.

In our case, the prefix we indicated was whatever came before point in the buffer. To calculate our possible completions, we filter the sample-completions values with that prefix using remove-if-not, returning only those candidates which begin with the prefix.

As with prefix calculations, real candidate calculations can be much more complex. But if you understand how the data is piped around, then constructing these complex candidate lists should be fairly straightforward.

Test-driving the backend

To test out our backend, first enter all of the code into a buffer and evaluate it (e.g. with M-x eval-buffer.) Then create a new buffer and run M-x fundamental-mode and M-x company-mode.5

In this new buffer enter the single character “a” and then, with the cursor immediately after the “a”, run M-x company-sample-backend. This should give you completion options something like this:
Screen Shot 2014-08-28 at 7.17.11 PM

If that works correctly, then you’ve done almost everything you need to for a fully working backend.

Plugging the backend into company-mode

The final thing you need to do to make your backend available to company-mode is to add it the list company-backends. One simple way to do that is with add-to-list list this:

(add-to-list 'company-backends 'company-sample-backend)

Once you’ve done this, you can use the command company-complete to do completions, and your new backend will be used in concert with all of the other backends in that list. Generally speaking, company-complete is the command you’ll use for completion with company-mode, and it’ll often be bound to a simple keystroke.

A complete company-mode backend

That’s all there is to writing a basic company-mode backend. In the next article in this series we’ll look at adding a few more details to what we have already.

Here’s a complete listing of the code used in this article:

(require 'company)

(defconst sample-completions
  '("alan" "john" "ada" "don"))

(defun company-sample-backend (command &optional arg &rest ignored)
  (interactive (list 'interactive))

  (case command
    (interactive (company-begin-backend 'company-sample-backend))
    (prefix (and (eq major-mode 'fundamental-mode)
                 (company-grab-symbol)))
    (candidates
     (remove-if-not
      (lambda (c) (string-prefix-p arg c))
      sample-completions))))

(add-to-list 'company-backends 'company-sample-backend)


  1. company-mode project site 

  2. The ycmd github repository and my Emacs client

  3. Sorry, I couldn’t resist the Star Wars reference

  4. We’ll filter the strings later based on context. 

  5. This puts your buffer in major mode “fundamental” and minor mode “company”. 

The super() Mystery Resolved

In the previous articles in this series1 we uncovered a small mystery regarding how Python’s super() works, and we looked at some of the underlying mechanics of how super() really works. In this article we’ll see how those details work together to resolve the mystery.

The mystery revisited

As you’ll recall, the mystery we uncovered has to do with how a single use of super() seemed to invoke two separate implementations of a method in our SortedIntList example. As a reminder, our class diagram looks like this:

Inheritance graph

Inheritance graph

SimpleList defines a simplified list API, and IntList and SortedList each enforce specific constraints on the contents of the list. The class SortedIntList inherits from both IntList and SortedList and enforces the constraints of both base classes. Interestingly, though, SortedIntList has a trivial implementation:

class SortedIntList(IntList, SortedList):
    pass

How does SortedIntList do what it does? From the code it seems that SortedIntList does no coordination between its base classes, and certainly the base classes don’t know anything about each other. Yet somehow SortedIntList manages to invoke both implementations of add(), thereby enforcing all of the necessary constraints.

super() with no arguments

We’ve already looked at method resolution order, C3 linearization, and the general behavior of super instances. The final bit of information we need in order to resolve the mystery is how super() behaves when called with no arguments.2 Both IntList and SortedList use super() this way, in both their initializers and in add().

For instance methods such as these, calling super() with no arguments is the same as calling super() with the method’s class as the first argument and self as the second. In other words, it constructs a super proxy that uses the MRO from self starting at the class implementing the method. Knowing this, it’s easy to see how, in simple cases, using super() is equivalent to “calling the base class implementation”. In these cases, type(self) is the same as the class implementing the method, so the method is resolved using everything in that class’s MRO except the class itself. The next entry in the MRO will, of course, be the class’s first base class, so simple uses of super() are equivalent to invoking a method on the first base class.

A key point to notice here is that type(self) will not always be the same as the class implementing a specific method. If you invoke super() in a method on a class with a subclass, then type(self) may well be that subclass. You can see this in a trivial example:

>>> class Base:
...     def f(self):
...         print('Type of self in Base.f() =', type(self))
...
>>> class Sub(Base):
...     pass
...
>>> Sub().f()
Type of self in Base.f() = <class '__main__.Sub'>

Understanding this point is the final key to seeing how SortedIntList works. If type(self) in a base class is not necessarily the type of the class implementing the method, then the MRO that gets used by super() is not necessarily the MRO of the class implementing the method…it may be that of the subclass. Since the entries in type(self).mro() may include entries that are not in the MRO for the implementing class, calls to super() in a base class may resolve to implementations that are not in the base class’s own MRO. In other words, Python’s method resolution is – as you might have guessed – extremely dynamic and depends not just on a class’s base classes but its subclasses as well.

Method resolution order to the rescue

With that in mind, let’s finally see how all of the elements – MRO, no-argument super(), and multiple inheritance – coordinate to make SortedIntList work. As we’ve just seen, when super() is used with no arguments in an instance method, the proxy uses the MRO of self which, of course, will be that of SortedIntList in our example. So the first thing to look at is the MRO of SortedIntList itself:

>>> SortedIntList.mro()
[<class 'SortedIntList'>, 
 <class 'IntList'>, 
 <class 'SortedList'>, 
 <class 'SimpleList'>, 
 <class 'object'>]

A critical point here is that the MRO contains IntList and SortedList, meaning that both of them can contribute implementations when super() is used.

Next, let’s examine the behavior of SortedIntList when we call its add() method.3 Because IntList is the first class in the MRO which implements add(), a call to add() on a SortedIntList resolves to IntList.add():

class IntList(SimpleList):
    # ...

    def add(self, item):
        self._validate(item)
        super().add(item)

And this is where things get interesting!

In IntList.add() there is a call to super().add(item), and because of how no-argument super() calls work, this is equivalent to super(IntList, self).add(item). Since type(self) == SortedIntList, this call to super() uses the MRO for SortedIntList and not just IntList. As a result, even though IntList doesn’t really “know” anything about SortedList, it can access SortedList methods via a subclass’s MRO.

In the end, the call to super().add(item) in IntList.add() takes the MRO of SortedIntList, find’s IntList in that MRO, and uses everything after IntList in the MRO to resolve the method invocation. Since that MRO-tail looks like this:

[<class 'SortedList'>, <class 'SimpleList'>, <class 'object'>]

and since method resolution uses the first class in an MRO that implements a method, the call resolves to SortedList.add() which, of course, enforces the sorting constraint.

So by including both of its base classes in its MRO4 – and because IntList and SortedList use super() in a cooperative way – SortedIntList ensures that both the sorting and type constraint are enforced.

No more mystery

We’ve seen that a subclass can leverage MRO and super() to do some pretty interesting things. It can create entirely new method resolutions for its base classes, resolutions that aren’t apparent in the base class definitions and are entirely dependent on runtime context.

Used properly, this can lead to some really powerful designs. Our SortedIntList example is just one instance of what can be done. At the same time, if used naively, super() can have some surprising and unexpected effects, so it pays to think deeply about the consequences of super() when you use it. For example, if you really do want to just call a specific base class implementation, you might be better off calling it directly rather than leaving the resolution open to the whims of subclass developers. It may be cliche, but it’s true: with great power comes great responsibility.

For more information on this topic, you can always see the Inheritance and Subtype Polymorphism module of our Python: Beyond the Basics course on PluralSight.5


  1. Part 1 and Part 2 

  2. Note that calling super() with no arguments is only supported in Python 3. Python 2 users will need to use the longer explicit form. 

  3. The same logic applies to it’s __init__() which also involves calls to super()

  4. Thanks, of course, to how C3 works 

  5. Python: Beyond the Basics on PluralSight 

Method Resolution Order, C3, and Super Proxies

In the previous article in this series we looked at a seemingly simple class graph with some surprising behavior. The central mystery was how a class with two bases can seem to invoke two different method implementations with just a single invocation of super(). In order to understand how that works, we need to delve into the details of how super() works, and this involves understanding some design details of the Python language itself.

Method Resolution Order

The first detail we need to understand is the notion of method resolution order or simply MRO. Put simply a method resolution order is the ordering of an inheritance graph for the purposes of deciding which implementation to use when a method is invoked on an object. Let’s look at that definition a bit more closely.

First, we said that an MRO is an “ordering of an inheritance graph”. Consider a simple diamond class structure like this:

>>> class A: pass
...
>>> class B(A): pass
...
>>> class C(A): pass
...
>>> class D(B, C): pass
...

The MRO for these classes could be, in principle, any ordering of the classes A, B, C, and D (and object, the ultimate base class of all classes in Python.) Python, of course, doesn’t just pick the order randomly, and we’ll cover how it picks the order in a later section. For now, let’s examine the MROs for our classes using the mro() class method:

>>> A.mro()
[<class '__main__.A'>,
 <class 'object'>]
>>> B.mro()
[<class '__main__.B'>,
 <class '__main__.A'>,
 <class 'object'>]
>>> C.mro()
[<class '__main__.C'>,
 <class '__main__.A'>,
 <class 'object'>]
>>> D.mro()
[<class '__main__.D'>,
 <class '__main__.B'>,
 <class '__main__.C'>,
 <class '__main__.A'>,
 <class 'object'>]

We can see that all of our classes have an MRO. But what is it used for? The second half of our definition said “for the purposes of deciding which implementation to use when a method is invoked on an object”. What this means is that Python looks at a class’s MRO when a method is invoked on an instance of that class. Starting at the head of the MRO, Python examines each class in order looking for the first one which implements the invoked method. That implementation is the one that gets used.

For example, let’s augment our simple example with a method implemented in multiple locations:

>>> class A:
...     def foo(self):
...         print('A.foo')
...
>>> class B(A):
...     def foo(self):
...         print('B.foo')
...
>>> class C(A):
...     def foo(self):
...         print('C.foo')
...
>>> class D(B, C):
...     pass
...

What will happen if we invoke foo() on an instance of D? Remember that the MRO of D was [D, B, C, A, object]. Since the first class in that sequence to support foo() is B, we would expect to see “B.foo" printed, and indeed that is exactly what happens:

>>> D().foo()
B.foo

What if remove the implementation in B? We would expect to see "C.foo", which again is what happens:

>>> class A:
...     def foo(self):
...         print('A.foo')
...
>>> class B(A):
...     pass
...
>>> class C(A):
...     def foo(self):
...         print('C.foo')
...
>>> class D(B, C):
...     pass
...
>>> D().foo()
C.foo

To reiterate, method resolution order is nothing more than some ordering of the inheritance graph that Python uses to find method implementations. It's a relatively simple concept, but it's one that many developers understand only intuitively and partially. But how does Python calculate an MRO? We hinted earlier – and you probably suspected – that it's not just any random ordering, and in the next section we'll look at precisely how Python does this.

C3 superclass linearization

The short answer to the question of how Python determines MRO is "C3 superclass linearization", or simply C3. C3 is an algorithm initially developed for the Dylan programming language,1 and it has since been adopted by several prominent programming languages including Perl, Parrot, and of course Python.2 We won't go into great detail on how C3 works, though there is plenty of information on the web that can tell you everything you need to know.3

What's important to know about C3 is that it guarantees three important features:

  1. Subclasses appear before base classes
  2. Base class declaration order is preserved
  3. For all classes in an inheritance graph, the relative orderings guaranteed by 1 and 2 are preserved at all points in the graph.

In other words, by rule 1, you will never see an MRO where a class is preceded by one of its base classes. If you have this:

>>> class Foo(Fred, Jim, Shiela):
...     pass
...

you will never see an MRO where Foo comes after Fred, Jim, or Shiela. This, again, is because Fred, Jim, and Shiela are all base classes of Foo, and C3 puts base classes after subclasses.

Likewise, by rule 2, you will never see an MRO where the base classes specified to the class keyword are in a different relative order than that definition. Given the same code above, this means that you will never see and MRO with Fred after either Jim or Shiela. Nor will you see an MRO with Jim after Shiela. This is because the base class declaration order is preserved by C3.

The third constraint guaranteed by C3 simply means that the relative orderings determined by one class in an inheritance graph – i.e. the ordering constraints based on one class's base class declarations – will not be violated in any MRO for any class in that graph.

C3 limits your inheritance options

One interesting side-effect of the use of C3 is that not all inheritance graphs are legal. It's possible to construct inheritance graphs which make it impossible to meet all of the constraints of C3. When this happens, Python raises an exception and prevents the creation of the invalid class:

>>> class A:
...     pass
...
>>> class B(A):
...     pass
...
>>> class C(A):
...     pass
...
>>> class D(B, A, C):
...     pass
...
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: Cannot create a consistent method resolution
order (MRO) for bases A, C

In this case, we've asked for D to inherit from B, A, and C, in that order. Unfortunately, C3 wants to enforce two incompatible constraints in this case:

  1. It wants to put C before A because A is a base class of C
  2. It wants to put A before C because of D's base class ordering

Since these are obviously mutually exclusive states, C3 rejects the inheritance graph and Python raises a TypeError.

That's about it, really. These rules provide a consistent, predictable basis for calculating MROs. Understanding C3, or even just knowing that it exists, is perhaps not important for day-to-day Python development, but it's an interesting tidbit for those interested in the details of language design.

Super proxies

The third detail we need to understand in order to resolve our mystery is the notion of a "super proxy". When you invoke super() in Python,4 what actually happens is that you construct an object of type super. In other words, super is a class, not a keyword or some other construct. You can see this in a REPL:

>>> s = super(C)
>>> type(s)
<class 'super'>
>>> dir(s)
['__class__', '__delattr__', '__dir__',
'__doc__', '__eq__', '__format__', '__ge__',
'__get__', '__getattribute__', '__gt__',
'__hash__', '__init__', '__le__', '__lt__',
'__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__self__',
'__self_class__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__thisclass__']

In most cases, super objects have two important pieces of information: an MRO and a class in that MRO.5 When you use an invocation like this:

super(a_type, obj)

then the MRO is that of the type of obj, and the class within that MRO is a_type.6

Likewise, when you use an invocation like this:

super(type1, type2)

the MRO is that of type2 and the class within that MRO is type1.7

Given that, what exactly does super() do? It's hard to put it in a succinct, pithy, or memorable form, but here's my best attempt so far. Given a method resolution order and a class C in that MRO, super() gives you an object which resolves methods using only the part of the MRO which comes after C.

In other words, rather than resolving method invocation using the full MRO like normal, super uses only the tail of an MRO. In all other ways, though, method resolution is occurring exactly as it normally would.

For example, suppose I have an MRO like this:

[A, B, C, D, E, object]

and further suppose that I have a super object using this MRO and the class C in this MRO. In that case, the super instance would only resolve to methods implemented in D, E, or object (in that order.) In other words, a call like this:

super(C, A).foo()

would only resolve to an implementation of foo() in D or E.8

Why the name super-proxy

You might wonder why we've been using the name super-proxy when discussing super instances. The reason is that instances of super are designed to respond to any method name, resolving the actual method implementation based on their MRO and class configuration. In this way, super instances act as proxies for all objects. They simply pass arguments through to an underlying implementation.

The mystery is almost resolved!

We now know everything we need to know to resolve the mystery described in the first article in this series. You can (and probably should) see if you can figure it out for yourself at this point. By applying the concepts we discussed in this article - method resolution order, C3, and super proxies - you should be able to see how SortedIntList is able to enforce the constraints of IntList and SortedList even though it only makes a single call to super().

If you'd rather wait, though, the third article in this series will lay it all out for you. Stay tuned!


  1. The Dylan programming language 

  2. Presumably starting with the letter "P" is not actually a requirement for using C3 in a language. 

  3. Python's introduction of C3 in version 2.3 includes a great description. You can also track down the original research describing C3. 

  4. With zero or more arguments. I'm using zero here to keep things simple. 

  5. In the case of an unbound super object you don't have the MRO, but that's a detail we can ignore for this article. 

  6. Remember that this form of super() requires that isinstance(obj, a_type) be true. 

  7. Remember that this form of super() requires that issubclass(type2, type1) be true. 

  8. Since object does not (yet...though I'm working on a PEP) implement foo()

Stay in Touch

Our business hours are 08:00 to 16:00 CET/CEST.