The Subtext and Metatext of Code

Posted on 01 November 2022

Programming relies heavily on implicit information, probably even more than we would like to admit. When we come across foreign code, we make bold assumptions and rely on conventions to try to build a mental model of the code. If that's easy to do, we call the code "clear", but that term is too vague. Instead, I would like to propose that we borrow some concepts from linguistics: subtext and metatext.

If we want our code to be understood by others, we need to explain the intent and context behind the code to the reader. We can do this explicitly via comments or implicitly, by including signals that hint toward the function of the code (e.g. choosing clear variable names). I call these hints the subtext of the code. Similarly, I call the direct explanation via comments metatext, because they are a part of the code that explains the code and are therefore self-referential.

Text, subtext & metatext

Let's make these definitions a bit more formal.

The text is all the source code for a library or application.
The subtext is the context, intent and reasoning behind the code implied by the text.
The metatext is anything that directly explains the code. For example, comments and documentation.

The code below is basically devoid of subtext and metatext. It is a function calculating the factorial of a natural number, but nothing in the code hints to that functionality.

def a(b):
    if b == 0:
        return 1
    else:
        return b * a(b-1)

Our goal is to change the text such that it includes hints to the intent, that is, to add the subtext. We can start doing this by choosing better names. The factorial name explains that this defines a function that computes a factorial. n implies by convention that the input should be a natural number. The result is much clearer!

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

Finally, we can add metatext with comments and type hints. Note that in statically typed languages, types are a part of the text, not the metatext, but in this post, I'm using untypechecked Python where type hints are nothing more than comments in disguise.

In the metatext, we can explain anything not immediately obvious from the code. We explicitly state that n should be an int and that n should be a natural number (i.e. greater than zero). We also state the limitations of the code in the metatext.

def factorial(n: int) -> int:
    """
    Computes the factorial of a natural number `n`

    Warning: this function will not terminate if `n` is negative.
    """
    # TODO: Rewrite without recursion for better performance.

    # We calculate the factorial by recursion. The base case of
    # the recursion comes from the fact that 0! = 1.
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

It should be obvious that good code has both good subtext and good metatext. In most cases, neither is sufficient to convey the intent efficiently. However, your opinion on what constitutes good sub- or metatext might differ from my opinion and that's okay. It also depends on the context and purpose of the code. For example, the requirements change depending on whether it is production code or code written for educational purposes.

Basic subtext

Now, let's explore how we can use subtext to our advantage. We start with this example of some intentionally bad Python code:

x = [4, 10, 2]
z = 0
for y in x:
    z += x

The code above computes the sum of 3 numbers, but you have to read the full code to understand it, because the subtext is missing. We can add subtext with better variable names:

numbers = [4, 10, 2]
total_sum = 0
for number in numbers:
    total_sum += number

Now, when you see the variables numbers and total_sum, you can already guess what the rest of the code is going to do and you only have to check that assumption. Of course, subtext can also imply incorrect information, for example, if the variable names do not match the actual computation¹:

numbers = [4, 10, 2]
total_product = 0
for number in numbers:
    total_product += number

Changing variable names is not all we can do to improve subtext. To show our intent even more clearly, we can call the sum function instead of using a for loop. The difference is that a for loop is a general construct for all kinds of loops, but that sum can only be used for, well, summing things. So sum carries the subtext that our intent is to sum things.

numbers = [4, 10, 2]
total_sum = sum(numbers)

In this final version, the code is immediately obvious, because the subtext supports the text. When people say that code should be "self-documenting", this is what they mean: that the subtext resolves most questions about the code.

Note that there are also multiple pieces of the code signalling that we are summing the numbers (the variable and function name). If there is consistency between multiple signals, that will help the reader to confidently build a mental model of the code.

Code organization is also part of the subtext. Putting functions next to each other in a file can implies that they are linked in some way. Similarly, if a function is defined far away from where it is called it is taken out of its own context and therefore harder to understand. This is often the case when there is some catch-all utils module, which tend to contain a bunch of unrelated functions.

Basic subtext guidelines

Give variables, functions and types descriptive names.
Give source files descriptive names.
Break complicated expressions up into several steps (with descriptive names).
Use specialized functions and language constructs instead general functions and language constructs.
Organize the source files in a logical way.

Conventional subtext

Subtext is also about convention, both within a codebase and within the larger programming community. For example, n implies that a variable holds an integer and f is often used for functions. Using other names, like b and k, could be confusing to other programmers. Similarly, using f for an integer would also be confusing.

Python has two main constructs for applying some function to the elements of a list: map and list comprehensions.

def double(x: int) -> int:
    return 2 * x

numbers = [4, 10, 2]

# map
doubled = list(map(double, numbers))

# list comprehension
doubled = [double(n) for n in numbers]

In my experience, most python programmers prefer the list comprehension and will use that instinctively. So, the list comprehension should be used, even if you do not agree with that preference (e.g. if you have a background in functional programming), because it carries subtext by convention established in the Python community.

However, if map, filter and fold are already used a lot in the codebase, then the map might be more appropriate, because it carries subtext by convention established in the codebase itself, overriding the conventions from the larger Python community.

Conventions are usually difficult to pick up for programmers that are new to a language or framework, because you usually pick them up over time. Relying too much on conventional subtext is therefore a pitfall. If you're learning a new technology or codebase it's therefore a good idea to study existing code to figure out the conventions, so you can apply them (in moderation) to your own code.

Conventional subtext guidelines

Study the conventions of languages, frameworks and codebases.
Establish and maintain clear conventions (e.g. by using a consistent naming scheme).
Do not break with convention without a good reason.
Be aware that newcomers might have a hard time understanding code that relies solely on conventional subtext.

Metatext

Subtext is often not enough to explain the full context and intent of the code. For instance, it is really difficult to express assumptions and edge cases in text and subtext alone. In these cases, we need to resort to metatext. Metatext is the text in the code that directly explains the code. Some examples are comments, docstrings and type hints. None of these have any influence on what the code does, but they exist to explain what the code does and why.

Metatext is constantly in danger of going out of sync with the text, even more so than subtext. So it's important to update documentation along with changes to the code.

Because comments appear alongside the code, it is only useful when it doesn't repeat information that is already easily gathered from the text or subtext. If a comment is necessary to explain the basic operation, you should first consider improving the subtext. But that doesn't mean that all code can (or should) be entirely self-documenting, because context is hard to convey via subtext. Below is an example of a bad and good comment.

# Bad: states the obvious
# Divide x by 100
x /= 100

# Good: gives reason for division
# Convert percentage into fraction
x /= 100

On the other hand, docstrings should state some things that are obvious from the code, because we cannot assume that a user of the function will look at the function body. An ideal docstring contains everything that a caller of a function might want to know and nothing more.

Metatext guidelines

Use comments mostly for information that cannot be put into subtext (context, reasoning, etc.).
Docstrings should contain all that a caller needs to know.
Keep metatext up to date with the code.

Conclusion

Clear code is code where the text, subtext and metatext all support each other to convey the intent, reasoning and context behind a computation. For every change you make to code, you should consider how you can update and improve the subtext and metatext such that your change becomes clear to others.

One might call this code "ironic", because the text and subtext do not match, but that concept does not seem particularly useful in the context of programming. If you know an application for ironic code, please let me know!