**Note:** This entry has been edited a couple of times (and might be edited in the future) to remove errors. Thanks to Benja Fallenstein for critique.

My previous post about recursion drew some comments. This is a topic that frequently confuses me, and there is a lot of confusion (some of it mine) in the blog post and its comments. On reflection, it seems prudent to examine what recursion really is.

Consider the following recursive definition: `0` is an expression; if T is an expression, then `s`T is an expression; nothing is an expression that doesn’t have one of these two forms. Now, it is clear that `0` is an expression. It is equally clear that `sssssK` is not an expression.

In some applications, it makes sense to consider infinite expressions like “…`ssssss0`“; that is, an infinite number of `s`s before the final `0`, and in others it doesn’t. Is that infinite utterance and expression? It does have one of the forms given in the definitions, so it is not forbidden from being an expression, but neither is there anything in the definition requiring that it is.

It is helpful to do this a bit more formally. Let us translate the self-referential definition into a set equation: The set of expressions is the set E that satisfies the equation

E = {

`0`} ∪ {

`s`e | e ∈ E }

But this definition is bogus! There are more than one set E that satisfies the equation, as remarked above, and thus there is no “the set E”!

We need to fix the definition in a way that makes the set E unique. The key question is, do we want to regard the infinite utterance discussed above as an expression. It is simplest to just say “the set of expressions is the *smallest* set E that …” if we want it to not be an expression, and if we want it to be an expression we should say “the set of expressions is the *largest* set E that …”

Let us make the following definitions:

Consider the set equation S = … S …, in which the set S is alone on the left-hand side and occurs at least once on the right-hand side. This equation is a

recursivedefinition of the smallest set S satisfying this equation, and acorecursivedefinition of the largest set S satisfying this equation.

Thus, a definition like

E = {

`0`} ∪ {

`s`e | e ∈ E }

is not recursive because of its form; it is

**self-referential**. By itself, it is an ambiguous and thus bogus definition. We use the words “recursive” and “corecursive” to indicate the intended method of disambiguation.

A rather nice example of this can be seen in the treatment of algebraic data types in strict (ML-style) and nonstrict (Haskell-style) functional programming languages. Consider the declaration (using Haskell syntax)

data IntList = Nil | Cons Int IntList

This can be read as a set equation as follows:

IntList = { Nil } ∪ { Cons n x | n ∈ Int ∧ x ∈ IntList }

Now, as a recursive definition, it is restricted to finite lists as found in ML, but as a corecursive definition, it also allows infinite lists as found in Haskell. (It unfortunately fails to address the question of bottom-terminated lists, which also can be found in Haskell.)

As it happens, the technique is not limited to set definitions; but to generalise it, we need to discuss a bit what we mean by the terms “smallest” and “largest”.

In a set context, a set is smaller than another set if it is the other’s subset. Notice how this makes the ordering partial, that is, there are sets that are not comparable (for example { 1 } and { 2 }; neither is smaller than the other, but neither are they equal). A collection of sets does not necessarily have a “smallest” set (that is, one of the sets that is a subset of all the others). However, we can always find the lowest upper bound (“lub” or “join”) of a collection of sets as the union of the sets, and the greatest lower bound (“glb” or “meet”) as the intersection of the sets, and if the lub or glb is itself a member of the collection, it is the smallest or largest set of the collection.

Haskellists will find things instantly familiar when I say that the smallest element of a mathematical universe (if it exists) is called **bottom** and is denoted by ⊥; somewhat less familiar will be that the greatest element (if it exists) is called **top** and is denoted by ⊤.

A partial function is any function that needs not be defined everywhere. For example, f(x) = 1/x is a partial function on the reals (it is not defined for x = 0). We can form a partial ordering of (partial) functions having the same domain and codomain by specifying that f is smaller than g if f(x) = g(x) for all such x that f(x) is defined. Sometimes it will make sense to specify that f is smaller than g if f(x) is smaller than g(x) for all such x that f(x) is defined. Now, it happens that all collections of functions have a lub; and in particular, the partial function that is defined nowhere is the smallest function (the bottom function).

Now, with this definition of “ordering” of functions we can apply the definitions of recursion and corecursion to functions. For example, consider f(x) = f(x) + 1. The bottom function (defined nowhere) qualifies, since undefined + 1 is undefined; thus it is the smallest solution to the equation and thus this equation is a recursive definition of bottom. However, consider f(x) = if x = 0 then 1 else x*f(x-1): any solution to this equation must have f(0) = 1 and thus the bottom function is disqualified. In fact, the smallest solution is the factorial function, and thus the equation is a recursive definition of factorial.

In the comments to my previous entry, I was asked whether the Haskell definition “ones = 1 : ones” is recursive. Of course, the question in itself is nonsensical; whether it is a recursive definition or something else (such as corecursive) depends on what the programmer intends. I will interpret this question as whether the equation is a recursive definition of the intended result, the infinite list of all ones.

First, observe that we can order Haskell lists (including the special “undefined” list) partially as follows: l1 is smaller than l2 if l1 is undefined, if they both are empty lists or if a) the first element of l1 is smaller than the first element of l2 and b) the tail (the list with the first element removed) of l1 is smaller than the tail of l2.

The undefined list does not qualify as a solution for “ones = 1 : ones”. No finite list qualifies either. The infinite list of all ones does qualify, and it is the only solution; hence, the equation is a recursive definition of the infinite list of all ones; it is also a corecursive definition of the same list.

Finally, let us consider the question that started it all: what should we make of the definition “recursion: see recursion”. First, note that a equation x = x admits all elements of the universe as a solution; hence, the proper reading of “recursion: see recursion” as a definition is that it is ambiguous and hence bad. It is a recursive definition of the bottom element of the universe (whatever that might be for a dictionary), and a corecursive definition of the top element of the universe (whatever that might be). In any case, it does not define recursion (unless of course, the concept of recursion is the smallest or the largest concept in the universe of concepts).

I don’t agree that “…ssssss0“; could be seen as an expression:

an expression is “0″ or “s” followed by a expression, and “…sss0″ is neither. but “sssss……….” could be an expression (well a coinductive one of course)

Note also that I don’t believe that there is such a thing as a larger set satisfying the equation: one could look at expression that have uncountable many “s”, and still see that as an expression… But I don’t believe that your article (or Haskell for the matter) look at such a strange mathematical construct.

You’re confused : what you describe as an infinite sequence of ‘s’ followed by the ’0′ symbol isn’t even a valid expression.

If you aren’t convinced about it, may I remind you that 0 and 1=0+ and 2=1+ and 3=2+, etc is the Peano construction of the integers, and that such integers are the basis for defining the notion of “finite” (vs “infinite”)?

Julien, you may be right about me being confused, but the argument in your second paragraph is simply irrelevant.

Rémi, you are probably right about the infinite utterance; however, it seems it’s a trivial error and does not invalidate my argument.

I’m not sure what you are talking about in your second paragraph. Can you elaborate?

Having read and understood this entire post, I can now summarize Antti-Juhani’s complaint in a way that programmers with no theoretical background will understand:

“The definition is an infinite recursion. Sure, it’s self-referential, but since it recurses forever, it

doesn’t actually define what ‘recursion’ is!”That said, I find it kind of a pity to introduce all this theory without talking about why it nicely models the real world. Antti-Juhani argues that this pretty much the definition of “popularized science,” but I have to admit that, as definitions go, I like “recursion: see recursion” better.

So let me have a quick go. [UPDATE: Didn't turn out to be quick at all. This is, in fact, as long as Antti-Juhani's original post. Proceed at your own peril.]

All of this comes from denotational semantics, which is concerned with what expressions in a programming language “mean” — and which ones mean the same thing. For example, “7″ ‘denotes’ the number

seven, “7+2″ denotes the numbernine, and “5+4″ also denotes the number nine.Now, here’s the knotty problem. If we have

f(x) = if x = 0 then 1 else x*f(x-1)

then f(4) denotes the number twenty-four, but what does f(-2) denote?

So, denotational semantics introduces a new, special value, “recurses forever.” An expression of type ‘integer’ either denotes an integer, or, like f(-2), it denotes “recurses forever.”

Now, about these partial orders. (Reminder: A partial order says, for every pair (x,y) of elements of a certain set, that x is smaller than y, equal to y, larger than y, or “incomparable” to y.) Here is Benja’s guide to what the partial orders of denotational semantics mean:

Let’s say that you write a program where, in one place, you can plug in either x or y. If you can write such a program that prints “foo” when you plug in x and prints “bar” when plug in y, then x and y are “incomparable.” If you can’t do that, but you

canwrite a program that prints “foo” when you plug in y, andrecurses foreverwhen you plug in x, then x is smaller than y. If every program you can write will do the same thing whether you plug in x or y, then x and y are equal.(We ignore details like out-of-memory errors and *how long* a program takes to do something.)

Examples:

- 5 and 7 are incomparable. Proof: if ___ == 5 then print “foo” else print “bar”.

- f(-2) < f(2). Proof: if ___ == 2 then print “foo” else print “foo” — if you plug in f(-2), this will loop forever, if you plug in f(2), it’ll print “foo”.

- (5+2) = 7. Whatever program I write, it will do the same thing, no matter which of these expressions I plug in.

Now, with expressions of type ‘integer,’ the resulting partial order is: “Recurses forever” is smaller than everything else. All other values are incomparable.

Recall from Antti-Juhani’s post that the bottom element (⊥) is defined to be the smallest element of a mathematical universe (if such a beast exists). Since “recurses forever” is smaller than everything else, it’s the bottom element of this universe. That’s what denotational semantics calls it. (Haskell, too.)

Of course, this partial order is kind of boring (the technical term is ‘flat’). It gets more interesting when we consider functions.

In general, we can talk about functions from values of type S to values of type T, where we already know the partial orders of S and T. In particular, we can talk about functions from integers to integers.

Now, if you can find an x and a program that prints “foo” when you plug in f(x) and “bar” if you plug in g(x), then you can write a program that prints “foo” when you plug in f, and “bar” when you plug in g. (Exercise! ;-)) Further, denotational semantics wants applying a function to a parameter and then doing something with the result to be the *only* way you can get information about a function. No f.get_function_name, please. (If your programming language has that, you can’t map the functions of your programming language 1:1 to the functions of denotational semantics.)

By my rule above, this implies that f <= g if and only if f(x) <= g(x) for all x. Since formal theories can’t have fuzzy stuff like my rule, denotational semantics takes this as the *definition* of the partial order on functions.

The bottom element of this mathematical universe, then, is the function that always returns bottom, no matter what it is given — or in other words, a function that recurses forever, no matter what parameter you pass to it.

(Side note. Denotational semantics allows only “monotone” functions — that is, if x <= y, then f(x) <= f(y) for all f, x, and y. That’s an example of how all these definitions allow us to make formal statements about the real world. In the real world, if you can find a function f and a program that prints “foo” when given f(x) and “bar” when given f(y), then you have a program that prints “foo” when given x and “bar” when given y — so if f(x) and f(y) are incomparable, then x and y must be incomparable, and the other way around, if x and y are comparable, then f(x) and f(y) must be comparable, too. (This reasoning doesn’t exclude the case that x <= y and f(x) >= f(y). If you really care, prove it yourself. :)) This ends the side note.)

Recall, now, Antti-Juhani’s definition of recursion: An equation of the form x = some expression (which may contain x), interpreted as a recursive definition, defines x to be the

smallestvalue satisfying this equation. The models constructed in denotational semantics have the property that for any equation of this form, the set of values satisfying it does have a smallest element, so ‘x’ is well-defined no matter what the equation is.It’s important to note that this does

notmean that x has to be “bottom,” because there are equations to which “bottom” is not a solution. Let’s look at a slightly simpler example than the one Antti-Juhani gave.f(x) = if x = 0 then 0 else f(x-1)

We have f(0) = 1, but bottom(0) = bottom, so f can’t be bottom.

The equation has more than one solution, though. It is satisfied by a function that returns 0 for x >= 0 and “returns bottom” (recurses forever) for x < 0; but it is

alsosatisfied by a function that returns 0 foreveryinput. And by an infinite number of functions that return 0 for some negative inputs and bottom for others.We “want” the first of these solutions, of course, because that’s what we get if we interpret the equation as a computer program, and computer programs are what we’re trying to model. But we’re in luck: The first solution is smaller than all the other ones, because for all inputs, they either return the same value, or the solution we want returns ‘bottom’ and the solution we don’t want returns 0 (and bottom < 0).

The point of it all is, of course, that when we interpret an equation as a computer program, we

alwaysget the least value satisfying the equation (if we use a partial order that follows my rule, above). And that, my friend, is denotational semantics.Denotational semantics tutorials. It’s the new monad tutorials!

Actually, Benja, I disagree about this all coming from denotational semantics. Order theory (what most of my posts is concerned with) precedes denotational semantics both intellectually and chronologically, not vice versa.

Though I do agree that the denotational semantics angle does enrich ones understanding of what’s happening here

My second paragraph is relevant, since you define your expressions as being 0 or s(expression) in the same way that Peano integers are defined by being 0 or (integer)+ !

Do you mean that order theory precedes denotational semantics in the application to the concept of non-termination and to the term “recursive” as in recursive function theory? If so, out of curiosity, would you have a reference?

(If you just mean that the recursive definition of

setsusing orders and least fixed points predates denotational semantics, sure, but I don’t think that’s all there is to it in a computer science context =-))Benja, what I mean is that the key insights of order theory (lattices, fixpoints) are needed before one can develop denotational semantics. Their application to computation is certainly due to Dana Scott and friends, but my post was mostly about the concept of recursion in general, not just in computational context

Ok, point taken

Julien, Rémi, what Antti-Juhani is doing is extending the concept of “expression” to something that we wouldn’t normally consider an expression. The particular extension in question isn’t well-defined. It probably doesn’t have a useful interpretation I can see if we interpret the language as a language of natural numbers; nor does it match the kind of infinite data structures we would get if we implemented natural numbers in a language like Haskell:

data Nat = Zero | Succ Nat

This is what Rémi has in mind, I believe.

I do think it’s a confusing example, because it doesn’t have an obvious interpretation, and because it isn’t well-defined. We

cangive it a clear formal definition, though; I believe that what Antti-Juhani has in mind is what the following formalization gives us:A “string of symbols” is a partial function from the positive natural numbers to the set { s, 0 }, such that if f(n+1) is defined, then f(n) is defined. In interpretation, f(n) is nth symbol from the

endof the string, if the string is at least n symbols long, and undefined otherwise.A string of symbols is an expression if it is in the set E that is the largest fixed point of the set equation Antti-Juhani gave.

This is well-defined. It is also a completely formal extension of the notion of ‘expression,’ without a particular useful interpretation. I do agree that this makes it a bad example, because something that has an intuitive interpretation would be easier to understand. But, Rémi, it doesn’t really make sense to me to argue about whether this is

the right wayora way that makes senseto extend the notion of expressions to infinite structures; it’s just one way to do so, and it’s not even trying to be a particularly useful way of doing it. And, Julien, it doesn’t make sense to me to argue about whether the infinite expression of the example is “really” an expression; it is not, of course, in the usual sense of “expression,” but this is an (arbitrary) extension of this usual notion.Well, to restate my argument on why the notion of “larger set satisfying the equation” is not well define, more in a “here is a part demonstration” and less in a “you have to use telepathy to understood me” way.

The fact is that one could look at an expression that would look like “s…. s….” where both … denote an infinity of “s” (so that this word is made of two infinite sub-word, one following the others). Then by adding this strange word to my set of expression, and still be possible solution of the equation, just one that is larger than what you describe as the “larger set satisfying the equation”…

Worst than that, I could build such a strange word for every ordinal there is… And then, as there is no set of all ordinal (for similar reason that there is no set of all set), one cannot build such a larger set.

Ps: I don’t have the courage to define ordinal here, just say that this is one of the generalization of the integer when one want to add infinite number to it.

Ps2: haskeller and denotational semantic goes for the “larger set”

because they made unstated assumption that such strange word are not

to be look at, because they are not really useful in computer science.

Ps3: I did not speak about word with an uncountable many letter here

because i don’t need it. Just the idea is that if you agree with the

choice axiom, then you can build an ordinal that have as many element

than any set, so take an uncountable set (say the set of real), and

you have you very strange word…

Ah

Sure, I’ve cut some corners here. The aim was to do some popularizing of the science, so there are some details that were unimportant to the main point missing.

Corecursion is well defined only when “the largest set” exists and is well defined. Knaster and Tarski famously proved that one sufficient condition is that all sets of sets have a lub (in jargon: we are dealing with complete lattices). This is not true in the universe of sets, but there are other interesting cases where it is true. There are also some other, more expressive sufficient conditions given in the literature.

The Knaster–Tarski theorem was published (in a more general form) by Tarski in 1955. I’m too lazy to dig up the exact reference.