My friend Benja Fallenstein got carried away in the comment section of my latest post on recursion and wrote a tutorial on how it all relates to denotational semantics. It’s a shame it is buried somewhere in the comments, so I am reposting the comment here with Benja’s permission.
The tagline is,
Denotational semantics tutorials. It’s the new monad tutorials!
Benja Fallenstein writes:
Having read and understood [the] entire post, I can now summarize Antti-Juhani’s complaint in a way that programmers with no theoretical background will understand:
“The definition is an infinite recursion. Sure, it’s self-referential, but since it recurses forever, it doesn’t actually define what ‘recursion’ is!”
That said, I find it kind of a pity to introduce all this theory without talking about why it nicely models the real world. Antti-Juhani argues that this pretty much the definition of “popularized science,” but I have to admit that, as definitions go, I like “recursion: see recursion” better.
So let me have a quick go. [UPDATE: Didn’t turn out to be quick at all. This is, in fact, as long as Antti-Juhani’s original post. Proceed at your own peril.]
All of this comes from denotational semantics, which is concerned with what expressions in a programming language “mean” — and which ones mean the same thing. For example, “7″ ‘denotes’ the number seven, “7+2″ denotes the number nine, and “5+4″ also denotes the number nine.
Now, here’s the knotty problem. If we have
f(x) = if x = 0 then 1 else x*f(x-1)
then f(4) denotes the number twenty-four, but what does f(-2) denote?
So, denotational semantics introduces a new, special value, “recurses forever.” An expression of type ‘integer’ either denotes an integer, or, like f(-2), it denotes “recurses forever.”
Now, about these partial orders. (Reminder: A partial order says, for every pair (x,y) of elements of a certain set, that x is smaller than y, equal to y, larger than y, or “incomparable” to y.) Here is Benja’s guide to what the partial orders of denotational semantics mean:
Let’s say that you write a program where, in one place, you can plug in either x or y. If you can write such a program that prints “foo” when you plug in x and prints “bar” when plug in y, then x and y are “incomparable.” If you can’t do that, but you can write a program that prints “foo” when you plug in y, and recurses forever when you plug in x, then x is smaller than y. If every program you can write will do the same thing whether you plug in x or y, then x and y are equal.
(We ignore details like out-of-memory errors and *how long* a program takes to do something.)
- 5 and 7 are incomparable. Proof: if ___ == 5 then print “foo” else print “bar”.
- f(-2) < f(2). Proof: if ___ == 2 then print “foo” else print “foo” — if you plug in f(-2), this will loop forever, if you plug in f(2), it’ll print “foo”.
- (5+2) = 7. Whatever program I write, it will do the same thing, no matter which of these expressions I plug in.
Now, with expressions of type ‘integer,’ the resulting partial order is: “Recurses forever” is smaller than everything else. All other values are incomparable.
Recall from Antti-Juhani’s post that the bottom element (⊥) is defined to be the smallest element of a mathematical universe (if such a beast exists). Since “recurses forever” is smaller than everything else, it’s the bottom element of this universe. That’s what denotational semantics calls it. (Haskell, too.)
Of course, this partial order is kind of boring (the technical term is ‘flat’). It gets more interesting when we consider functions.
In general, we can talk about functions from values of type S to values of type T, where we already know the partial orders of S and T. In particular, we can talk about functions from integers to integers.
Now, if you can find an x and a program that prints “foo” when you plug in f(x) and “bar” if you plug in g(x), then you can write a program that prints “foo” when you plug in f, and “bar” when you plug in g. (Exercise! ) Further, denotational semantics wants applying a function to a parameter and then doing something with the result to be the *only* way you can get information about a function. No f.get_function_name, please. (If your programming language has that, you can’t map the functions of your programming language 1:1 to the functions of denotational semantics.)
By my rule above, this implies that f <= g if and only if f(x) <= g(x) for all x. Since formal theories can’t have fuzzy stuff like my rule, denotational semantics takes this as the *definition* of the partial order on functions.
The bottom element of this mathematical universe, then, is the function that always returns bottom, no matter what it is given — or in other words, a function that recurses forever, no matter what parameter you pass to it.
(Side note. Denotational semantics allows only “monotone” functions — that is, if x <= y, then f(x) <= f(y) for all f, x, and y. That’s an example of how all these definitions allow us to make formal statements about the real world. In the real world, if you can find a function f and a program that prints “foo” when given f(x) and “bar” when given f(y), then you have a program that prints “foo” when given x and “bar” when given y — so if f(x) and f(y) are incomparable, then x and y must be incomparable, and the other way around, if x and y are comparable, then f(x) and f(y) must be comparable, too. (This reasoning doesn’t exclude the case that x <= y and f(x) >= f(y). If you really care, prove it yourself. ) This ends the side note.)
Recall, now, Antti-Juhani’s definition of recursion: An equation of the form x = some expression (which may contain x), interpreted as a recursive definition, defines x to be the smallest value satisfying this equation. The models constructed in denotational semantics have the property that for any equation of this form, the set of values satisfying it does have a smallest element, so ‘x’ is well-defined no matter what the equation is.
It’s important to note that this does not mean that x has to be “bottom,” because there are equations to which “bottom” is not a solution. Let’s look at a slightly simpler example than the one Antti-Juhani gave.
f(x) = if x = 0 then 0 else f(x-1)
We have f(0) = 1, but bottom(0) = bottom, so f can’t be bottom.
The equation has more than one solution, though. It is satisfied by a function that returns 0 for x >= 0 and “returns bottom” (recurses forever) for x < 0; but it is also satisfied by a function that returns 0 for every input. And by an infinite number of functions that return 0 for some negative inputs and bottom for others.
We “want” the first of these solutions, of course, because that’s what we get if we interpret the equation as a computer program, and computer programs are what we’re trying to model. But we’re in luck: The first solution is smaller than all the other ones, because for all inputs, they either return the same value, or the solution we want returns ‘bottom’ and the solution we don’t want returns 0 (and bottom < 0).
The point of it all is, of course, that when we interpret an equation as a computer program, we always get the least value satisfying the equation (if we use a partial order that follows my rule, above). And that, my friend, is denotational semantics.