Laziness

Jun 7, 2024

When researchers talk about the differences between programming languages, we have a number of axes on which to formally compare them. Tony Hoare gave an early account of these formal differences in his paper, “The Varieties of Programming Language”¹:

To lighten the task, this paper concentrates on the following varieties of programming language.

Deterministic, like LISP or PASCAL: the possible result of executing each program is fully controllable by its environment or user.

Non-deterministic, like occam or Dijkstra’s language: the result of execution of a program may be affected by circumstances beyond the knowledge or control of its environment or user.

Strict, like LISP or Dijkstra’s language: each stage of a calculation must be completed successfully before the start of the next stage which will use its results.

Lazy, like KRC or Miranda: a calculation is started only when its result is found to be needed.

Interactive, like CSP or occam: the program communicates with its environment during execution.

Non-interactive, like the original FORTRAN: all the data required by the program may in principle be accumulated before the program starts; and all the results may be accumulated during the calculation for eventual output if and when the program terminates successfully.

I enjoy sharing this piece of history even though it appears quite obscure now. LISP and FORTRAN are the only languages that remain immediately recognizable even if they bring to mind ancient gray-haired programmers wearing pocket protectors in front of UNIX terminals. Perhaps there is a Lindy Effect in play.

The paper’s retro vibe—featuring as examples only passé languages—only reinforces that Hoare’s insights stand the test of time: languages come and go, but the algebraic varieties that formally underpin their semantic behavior can be generalized and remain applicable to modern languages.

The three axes highlighted here are clear: determinism vs. non-determinism, interactivity vs. non-interactivity, and strictness vs. laziness. Let us focus on the third axis. By way of updated example, strict languages include C++, Python, Javascript, Go, and OCaml, along with pretty much everything else you’ve ever heard of or used. Lazy languages include Haskell.

Semantics of Strictness

We bind variables in Python like so:

>>> x = 10
>>> x
10

The left side of the = is the name to which is bound the value of the right side. The right side may be a complex expression, in which case it is evaluated before the variable is assigned the resulting value:

>>> x = 2 * 3 + 4
>>> x
10

We can also write functions in Python to associate a particular computation with a name.

>>> def add(x, y)
...     return x + y
>>> add(1, 2)
3
>>> add(1, 2 * 3)
7
>>> add(add(2, 3), add(4, 5))
14

def stands for “define”. add is the name we give to the function. x and y are arguments to the function which can be used as variables in the scope of the function body. The function returns their sum. We call the function by providing concrete values in place of the variables. These concrete values can of course be complex subexpressions.

A property of Python is that arguments to functions (like 1 and 2 in the expression add(1,2)) are evaluated before entering the function body. This property is known as strictness.

Hoare’s paper explains the fundamental property of strictness in terms of what we called a bottom, which is typically written as \(\bot\). We can informally think of \(\bot\) as “a program that always fails.” If \(\bot\) is evaluted, the program crashes and comes to a halt, or goes into an infinite loop—either way, no useful result is produced. In strict languages, since arguments are always evaluated, programs with \(\bot\) in them—even if never used—evaluate to \(\bot\).

We can define \(\bot\) in Python like this:

def bottom():
    raise Exception("Bottom always fails")

For example:

>>> x = bottom()
Exception: Bottom always fails

Note that the error occurs at assignment. We evaluate the right side, so the program fails before the assignment can be completed successfully.

Consider this Python function:

def foo(x):
    return 0

Notice that x is an argument to foo, but it is not used. What happens when we call foo(bottom())? Since the arguments are evaluated even if they are not used, this causes an error. The whole program fails (produces \(\bot\)) because there was a \(\bot\) in it.

The exception to this rule is branching conditionals.

>>> if True:
...     print("Hello, World!")
... else:
...     bottom()

Hello, World!

if statements only evaluate the branch that is taken after scrutizing the value of the condition.

There is similar behavior in the logical and and or operators:

>>> True or bottom()
True
>>> bottom() or True
Exception: Bottom always fails
>>> False and bottom()
False
>>> bottom() and False
Exception: Bottom always fails

These operators “short-circuit” and return a value without evaluating the second argument if they can determine the result based on just the first argument. In this way, they behave like if statements. You can adequately define them like so:

def and(p, q):
   if p:
      return q
   else:
      return False
def or(p, q):
   if p:
      return True
   else:
      return q

q only gets evaluated in half of the possible cases.

Semantics of Laziness

Lazy languages, by contrast, evaluate expressions and function arguments only when needed. As a result, they only produce \(\bot\) in programs where \(\bot\) is used.

In Haskell, \(\bot\) is written as undefined. You can write:

>>> x = undefined
>>> x
*** Exception: undefined

Notice that the exception only happens when we try to evaluate x. The definition of x in the previous line is accepted, despite having a bottom value. This differs from Python because Haskell has non-strict evaluation.

Unused arguments are never evaluated in Haskell:

>>> foo x = 0
>>> foo undefined
0

Self-Reference

Python and Haskell differ in another way, related to laziness. Python does not permit self-referential variables, but Haskell does, as a side effect of lazy evaluation.

>>> # Python
>>> x = x
NameError: name 'x' is not defined

>>> -- Haskell
>>> x = x

The Python program is rejected because the right side is evaluated before the name x is added to the environment. You can think of this as “defending” against an infinite loop: if x was added to the environment before evaluating the right side, it would endlessly look up its own definition, leading nowhere. You can also think of assignment as a function where the right side is the argument and the effect of the function is to add a new binding to the environment.

The Haskell program is valid because the right side is not evaluated. The name is added to the environment bound not to a value, but to a thunk. A thunk is a delayed computation. If we request the value of x, it will enter an infinite loop:

>>> x = x
>>> x

[...infinite loop...]

The program hangs there without returning anything, and can only be stopped by sending an interrupt signal to the process. In the old days before fancy operating systems, you would have to restart the computer.

Recursion

While x = x is universally nonsense, there exist useful self-referential (also called recursive) programs. It is often sensible to define a function in terms of itself so long as there is a base case and inductive behavior. For example, a rather silly add function in Python could look like this:

def silly_add(x, y):
   if y == 0:
      return x
   else:
      return silly_add(x + 1, y - 1)

This works because functions in Python, unlike values, can be self-referential, as their bodies are evaluated not when defined but when called. In this way functions in strict languages are sort of like expressions in lazy languages, except that functions are recomputed each time they are called and thunks are only evaluated once (after that, the computed value remains bound to the variable name in the environment).

For completeness’s sake, here is the same thing in Haskell syntax:

sillyAdd x y =
  if y == 0
  then x
  else sillyAdd (x + 1) (y - 1)

Haskell syntax differs a bit from Python’s; function arguments are juxtaposed with the function instead of listed inside of parentheses. Here the parentheses are necessary to disambiguate the order of operations.

Recursion is useful for working with data structures of arbitrary size. It is often more natural to express programs recursively as opposed to iteratively, especially when dealing with trees.

Circular Data Structures

We have seen that:

in Python, functions can be recursive but values cannot; and,
in Haskell, both functions and values can be recursive.

We have not seen why you would want values to be recursive. Typical finite data structures resembling DAGs are not good candidates for recursive definition. But sometimes data structures contain cycles, like a circular linked list:

Regular linked list:

     +-+     +-+     +-+
     |1| --> |2| --> |3| --> []
     +-+     +-+     +-+
  
  [] is a null value.
  
Circular linked list:

       +-+     +-+     +-+
  +--> |1| --> |2| --> |3| ---+
  |    +-+     +-+     +-+    |
  |                           |
  +---------------------------+

We can construct a linked list in Python quite easily:

>>> class Node:
...    def __init__(self, data, next):
...        self.data = data
...        self.next = next
...    def __repr__(self):
...        return str(self.data) + " -> " + str(self.next)
>>> c = Node(3, None)
>>> b = Node(2, c)
>>> a = Node(1, b)
>>> a
1 -> 2 -> 3 -> None

To create a circular list, all we have to do is modify the links at the ends after they are created.

>>> c.next = a

Now we cannot print the data structure without doing some cycle detection, because it will loop infinitely as it follows the links in a circle. But it is fun to see the arrows work out, so let us write a rudimentary version that assumes each data value is unique:

>>> def cyclic_print(node):
...     def loop(node, visited):
...         if node.data in visited:
...            return str(node.data)
...         else:
...            return str(node.data) + ' -> ' + loop(node.next, visited | {node.data})
...     return loop(node, set())
>>> cyclic_print(a)
1 -> 2 -> 3 -> 1

Haskell, besides being lazy, has the property of being pure, meaning it is not possible to modify existing bindings. Instead, we make use of laziness to establish the circular linked list through mutual recursion.

>>> -- Definition of a node:
>>> --   contains an integer and a pointer to another node.
>>> data Node = Node Int Node | End deriving (Show, Eq)
>>> cyclicPrint node =
...   loop node []
...   where loop End _ = "[]"
...         loop node@(Node d n) seen =
...           if d `elem` seen
...           then show d
...           else show d ++ " -> " ++ loop n (d : seen)
>>> let a = Node 1 b
...     b = Node 2 c
...     c = Node 3 a
... in putStrLn $ cyclicPrint a
1 -> 2 -> 3 -> 1

We are able to reference b in the definition of a before b is defined. This is okay because by the time we use a, a thunk for b has been created too, and these are cascadingly evaluated when called for.

Laziness helped us create a circular data structure without mutation. In that sense, the whole motivation for laziness can be construed as a solution to the problem of creating circular data structures in pure languages.

Infinite Data Structures

I have mentioned a few times now that thunks are only evaluated when called. The previous example revealed that thunks can contain thunks, and these nested thunks are evaluated outside-in, so you do not have to evaluate all the thunks in a data structure when the outermost thunk is used. This leads to a neat consequence of laziness:

>>> numbers = 0 : (map (+1) numbers)
>>> take 10 numbers
[0,1,2,3,4,5,6,7,8,9]
>>> take 20 numbers
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]
>>> -- and so on...

It is easy to define infinite data structures in the presence of laziness. x = x was a lazily infinite data structure, but not a very useful one. Here we see a list of all the natural numbers. The list is infinite, but so long as we only request to see a finite piece of it, Haskell will comply and lazily evaluate only what is needed.

If you attempt to evaluate the whole infinite list, however, or take the last element, the computation will go on forever until interrupted:

>>> last numbers

[...infinite loop...]

>>> numbers

[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100

[...and so on until you stop it...]

After evaluating a piece of the list, there remains a thunk at the end. For instance, after take 5 numbers, the data structure under the hood looks like numbers = 1 : 2 : 3 : 4 : 5 : (map (+1) numbers)

One of the cuter definitions in Haskell is the infinite Fibonacci sequence. In math, we define it like so:

\[\begin{align}F_0 &= 0 \\ F_1 &= 1 \\ F_n &= F_{n-1} + F_{n-2}\end{align}\]

This could be a recursive function parameterized by the sequence index in Haskell or Python, but the naive implementation is infamously inefficient (exponentially so) since it repeats work by recomputing all previous elements of the sequence at every recursive callsite. As I mentioned previously, function bodies are recomputed when called. Thunks are only evaluated once. Instead of working out some clever ways to memoize the intermediate values in the function definition, Haskell’s laziness permits us to write a programmatic implementation that closely follows the mathematical definition:

>>> fib = 0 : 1 : zipWith (+) fib (tail fib)
>>> take 10 fib
[0,1,1,2,3,5,8,13,21,34,55,89] 
>>> -- Indexing is efficient since all the previous elements remain evaluated!
>>> fib !! 11
89

The definition is a bit obscure, so here is an alternative definition that is perhaps more pedagogical:

fib = 0 : 1 : [a + b | (a, b) <- zip fib (drop 1 fib)]

It may surprise you that the same thing is possible in Python using a feature called generators. They are not as ergonomic, but the feature exists.

Expressivity

It is important to note at this point that lazy and strict languages are equally powerful. That is, there is no problem that can be solved in one that cannot be solved in the other. The differences are entirely in the realm of expressivity, and occasionally the evaluation order affects performance (how long it takes a program to run, and other related metrics).

The advantage of laziness is that it is much simpler to write programs to take advantage of lazy evaluation in Haskell because it is built into the language at a fundamental level, so every feature interacts with laziness the way you would expect. Replicating laziness in Python would require rewriting lots of basic functionality, and it probably would not be very performant. There are libraries for Python which have already done some of this work, but still Python is not a naturally lazy language.

While it would be cute to argue the same applies in reverse to Haskell—that Haskell is no good for strict evaluation—it is not true. Haskell has built-in features for granularly controlling evaluation behavior. You can force strict evaluation where you want it without heavy boilerplate or using special libraries. Haskell is a lazy language, but with convenient access to strictness when necessary.

Who Cares?

While it is pedagogically interesting to find the “right” way to introduce the basics of laziness, nothing here is terribly deep or surprising. The existence of the feature might be unfamiliar at first, but when writing Haskell quickly becomes intuitive and therefore uninteresting.

The “interesting” aspects of laziness are:

The algebraic properties of lazy languages, as initially described in Hoare’s paper
GHC’s implementation of laziness and how thunks work
How strict languages like C++ or Python might design lazy features and implement laziness in the compiler
How seq works, and the rest of strict Haskell
How laziness interacts with concurrency
Detailed analysis of laziness’s performance tradeoffs. When is it faster? When is it slower? What is the practical consequence of delaying computation? Can you write performant “low-level” code without being able to precisely dictate when something is evaluated?
How to exploit laziness to great effect in certain problems

At the time of writing I am not knowledgeable about any of these topics.

Hoare, C.A.R. (1989). The varieties of programming language. TAPSOFT ’89. Lecture Notes in Computer Science, vol 351. Springer. https://doi.org/10.1007/3-540-50939-9_121. ↩︎