Ramblings inspired by Feyerabend’s Against Method, Part II: My preliminary take

Paul Feyerabend’s Against Method is a scandal. Most people who have heard of it know its tagline: “Anything goes”. As I mentioned in my previous post, my impression of the book from secondary sources was that Feyerabend was a madman and the book is sacrilege. Now, having read the book myself, I find myself impressed by the depth and clarity of his arguments and by his insight.

His key claim is that a successful (general) method of science is impossible and that trying to impose a (general) method is harmful.

In Feyerabend’s terminology, a method must “contain[] firm, unchanging, and absolutely binding principles for conducting the business of science” (p. 7 of the Fourth Edition, Verso 2010). To be counted as a success, such a method must “remain[] valid under all circumstances and [… be an] agency to which appeal can always be made” (p. 161).

I agree that all such general methods in the literature that I have been exposed to are failures, by Feyerabend’s standard. Neither of the theories I adopt in my doctoral dissertation (pending a public defense), the Bayesian approach to epistemology and Imre Lakatos’s theory of research programs, satisfy this test, and I freely admit this; both are very permissive and neither give objective and precise decision rules for considering the merit of a scientific hypothesis or theory, and thus do not count as methods under Feyerabend. And Feyerabend is quite correct (assuming his historical research is sound, which I am not qualified to judge) in his conclusion that no existing method (as the term is here defined) could have allowed certain key historic developments, and therefore none of them succeed.

For example, Popper’s falsificationism fails for two alternative reasons. If we suppose that it is a method (under Feyerabend’s definition of a method), then it must be followed literally in all cases, but in that case it fails the test case of Galileo, as discussed extensively in the book. But Popper can also be read metaphorically, or as general guidelines not to be taken as a literal method, in which case it can be understood to be consistent with the Galileo case; but then, it is (by assumption) not a method. In either case, it is not a successful method.

I also agree that it is probably impossible to come up with a successful method, by that standard. The history of philosophy is full of expounded theories, all of which seem to fail for some reason or other. It is very easy to move from this to a general scepticism: there can be no such successful method. It seems to me that it is also the correct (though defeasible) conclusion.

Further, if (as I have conceded) it is impossible to devise a successful method, then trying to impose a method is certainly harmful. I accept this.

The catch is here: this conclusion must be read with Feyerabend’s definitions firmly in mind. It is a misunderstanding of Feyerabend to further conclude that he denies the value of scientific methods. The singular in the title is a conscious choice, and very significant: Feyerabend does not oppose methods; he opposes a unified, one-size-fits-all method, singular.

Where Kuhn talks about paradigms and Lakatos about research programs, Feyerabend talks about traditions. Within a tradition, Feyerabend acknowledges there to be quite a bit of value in binding rules, and within a tradition there can be a successful method. Feyerabend’s “anything goes” is not a license to forget consistency requirements in a single piece of work or when working within a tradition:

Admitting velocities larger than the velocity of light into relativity and leaving everything else unchanged gives us some rather puzzling results such as imaginary masses and velocities. […] Admitting contradictions into a system of ideas allegedly connected by the laws of standard logic and leaving everything else unchanged makes us assert every statement. Obviously we shall have to make some further changes [… which] remove[] the problems and research can proceed as planned. (p. 246)

One of Feyerabend’s key conclusions is that traditions can only be evaluated from within a tradition, whether itself or some other one; an objective, meta-traditional evaluation is impossible. I will concede that his argument looks plausible (I will not review it here). Once this is accepted, it easily follows that no tradition is objectively better than all others. (I note that it is theoretically possible that some tradition appears superior to all other traditions viewed from any tradition; but certainly, no existing tradition has that rather remarkable property. Usually all other traditions appear weaker than the one being used as the vantage point.)

Feyerabend goes even further. He claims that traditions are incommensurable: a tradition involves a whole world-view, and there is no lossless conversion of ideas, thoughts and claims from a tradition to another. The only way to truly understand a tradition is to first become its adherent.

This conclusion seems rather absurd to anyone who has been educated in one tradition and has never made a leap from one tradition to another. However, the truth of this claim seems quite plausible from my own experience trying to read both quantitative and qualitative methodological literature: the former typically dismisses the latter as unscientific, and the latter typically dismisses the former as “positivistic” (that this label is a misnomer makes no difference). It is even more plausible to me having discussed methodology with both quantitative and qualitative researchers, and having observed discussions of methodology between quantitative and qualitative researchers. Usually, they talk past each other, each hearing nonsense from the other. It’s like they’re using different languages even though all are using ordinary scientific English.

Yet, I cannot accept incommensurability as a binding constraint. I hope to be able to transcend several traditions, to be able to work in them and hopefully function as a bridge of sorts. Maybe I am a lunatic in that hope; I do not know.

Finally, Feyerabend claims that because traditions are incommensurable and an objective comparison of them is impossible, there is no good reason why science should have priority in politics over any other set of traditions. Feyerabend died in 1994, before the evidence-based movement became the force it is today, but I suspect he would have loudly protested ideas like evidence-based policy (or more commonly nowadays, evidence-informed policy). He makes a forceful claim that basing policy on science is a form of slavery.

I can see his point but I am also in violent opposition. A lot of scientific activity is truly traditional, where things are done in a particular way just because they’ve always been done that way (though the “always” often can be as short a period as a couple of years), and when one goes to examine the history of that particular way, it turns out it was an accident, with no good rational reason for its adoption, and sometimes even was adopted despite good reason to abandon it. In these cases, adopting a scientific consensus position just because it is one is folly. And in general, where a decision one makes only affects oneself, it is better left to individual freedom and not impose any sort of an outside rule, whether scientific or otherwise. But there are quite many problems where a decision has to be made collectively, and I will vote for the decision to be evidence-informed; if I prevail, the only form of slavery involved is that of the tyranny of the majority.

In conclusion, it seems to me Feyerabend is more right than not, and that he has been mostly misunderstood (like he claims himself). I would recommend this book as a guide for anyone interested in multitradition (or mixed methods, as it is often called) work in the sciences. I would not recommend it as a methodology itself – and neither would Feyerabend:

‘anything goes’ is not a ‘principle’ I hold – I do not think that ‘principles’ can be used and fruitfully discussed outside the concrete research situation they are supposed to affect – but the terrified exclamation of a rationalist who takes a closer look at history. (p. xvii)

Let me repeat this: There is no Feyerabend method, and conducting research following Feyerabend is to misunderstand him. (At least to the extent one only considers Against Method; I have not read Feyerabend’s other work.) He preaches tolerance, but one should look for methodological guidance elsewhere.

Ramblings inspired by Feyerabend’s Against Method, Part I: My road to Feyerabend

About 15 years ago, I studied for an exam on the philosophy of science, a required (and very much anticipated) part of my minor in philosophy. I must have learned about Karl Popper and his falsificationism, which did not really appeal to me. There was one thing that hit me hard, one that remained stuck in my mind for more than a decade: the idea of paradigms, which consisted of a hard core (usually immutable) and a protective belt (easily changed to account for discovered anomalies), and of scientific revolutions which occasionally happen (usually by generational shift), replacing one hard core with another.

About a decade later, I started debating the methodology and foundations of science with my colleague Ville Isomöttönen. At some point I suggested we both read an introductory textbook on the topic to inform our discussions. I believe we read Peter Godfrey-Smith’s excellent Theory and Reality.

In this book, learned again about paradigms, and noticed that there were several philosophers I had conflated together. Thomas Kuhn talked about paradigms, but the idea of hard cores and protective belts comes from Imre Lakatos, who did not talk about paradigms but used his own term, that of a research programme. Then there was Paul Feyerabend, who was basically crazy. Or that’s how I remember my reaction of reading about him in Godfrey-Smith’s textbook.

This was around the time I started working on the research that became my licentiate thesis. Very early on, one of my advisors, Dr. Vesa Lappalainen, asked me to explain what evidence is. That turned out to be a very difficult question to answer; I continued reading and pondering about it until I submitted the thesis, and even beyond that point. I figured that the philosophy of science probably has an answer, but I cannot really base my discussion of it in a thesis solely on introductory courses and textbooks. I needed to go read the originals.

The first original book on the philosophy of science I read during this period was Thomas Kuhn’s The Structure of Scientific Revolutions. I also borrowed from the library a copy of Karl Popper’s The Logic of Scientific Discovery, of which I was able to finish only the first chapter at this time. Kuhn was very interesting, and I finally realized how thoroughly I had misunderstood him from the secondary sources; his arguments made quite a bit of sense, but his insistence of at most one paradigm in each discipline was obviously false. Popper’s falsificationism is obviously true, but also severely inadequate.

Very early on during the licentiate thesis study, as I was doing preliminary literature research on evidence-based medicine (EBM), I came across the blog Science-Based Medicine, and particularly their post series critiquing EBM (start from Homeopathy and Evidence-Based Medicine: Back to the Future Part V). From this and other sources, I learned of Bayesian epistemology, which I started reading about over the next couple of years. As I have written previously on this blog, it is my current preferred theory of epistemology.

This Spring, some months after the licentiate thesis was approved, I traveled to Essen, Germany, for a three-month research visit at the University of Duisburg-Essen. Two very significant things happened there: I wrote a substantial part of my doctoral dissertation (currently pending public defense) and I spent quite a bit of time discussing the philosophy and methodology of science with Dr. Stefan Hanenberg, who had been one of the examiners of the licentiate thesis. The topics of those discussions probably had something to do with that the chapters I was writing there dealt with philosophy and epistemology.

During that time, I finally read Imre Lakatos’s work on the philosophy of science (The Methodology of Scientific Research Programmes) and on the philosophy of mathematics (Proofs and Refutations), both of which were eye-opening. Lakatos spends a lot of time in the former construing and critiquing Popper, and that discussion allowed me to understand Popper the first time ever (though I recognize it’s from Lakatos’s version of Popper); I finally read Popper’s Logic of Scientific Discovery properly also at that point.

The discussions with Dr. Hanenberg frequently came back to Paul Feyerabend and his Against Method. I knew it well enough from secondary sources to know that I was not going to cite it in the dissertation, and so I did not read it at that point. The time to do that was once the dissertation was submitted.

My next post will discuss my actual reactions to the book, as I just finished it yesterday.

Benja on denotational semantics

My friend Benja Fallenstein got carried away in the comment section of my latest post on recursion and wrote a tutorial on how it all relates to denotational semantics. It’s a shame it is buried somewhere in the comments, so I am reposting the comment here with Benja’s permission.

The tagline is, Denotational semantics tutorials. It’s the new monad tutorials!

Benja Fallenstein writes:

Having read and understood [the] entire post, I can now summarize Antti-Juhani’s complaint in a way that programmers with no theoretical background will understand:

“The definition is an infinite recursion. Sure, it’s self-referential, but since it recurses forever, it doesn’t actually define what ‘recursion’ is!”


That said, I find it kind of a pity to introduce all this theory without talking about why it nicely models the real world. Antti-Juhani argues that this pretty much the definition of “popularized science,” but I have to admit that, as definitions go, I like “recursion: see recursion” better. :)

So let me have a quick go. [UPDATE: Didn’t turn out to be quick at all. This is, in fact, as long as Antti-Juhani’s original post. Proceed at your own peril.]

All of this comes from denotational semantics, which is concerned with what expressions in a programming language “mean” — and which ones mean the same thing. For example, “7″ ‘denotes’ the number seven, “7+2″ denotes the number nine, and “5+4″ also denotes the number nine.

Now, here’s the knotty problem. If we have

f(x) = if x = 0 then 1 else x*f(x-1)

then f(4) denotes the number twenty-four, but what does f(-2) denote?

So, denotational semantics introduces a new, special value, “recurses forever.” An expression of type ‘integer’ either denotes an integer, or, like f(-2), it denotes “recurses forever.”

Now, about these partial orders. (Reminder: A partial order says, for every pair (x,y) of elements of a certain set, that x is smaller than y, equal to y, larger than y, or “incomparable” to y.) Here is Benja’s guide to what the partial orders of denotational semantics mean:

Let’s say that you write a program where, in one place, you can plug in either x or y. If you can write such a program that prints “foo” when you plug in x and prints “bar” when plug in y, then x and y are “incomparable.” If you can’t do that, but you can write a program that prints “foo” when you plug in y, and recurses forever when you plug in x, then x is smaller than y. If every program you can write will do the same thing whether you plug in x or y, then x and y are equal.

(We ignore details like out-of-memory errors and *how long* a program takes to do something.)


– 5 and 7 are incomparable. Proof: if ___ == 5 then print “foo” else print “bar”.

– f(-2) < f(2). Proof: if ___ == 2 then print “foo” else print “foo” — if you plug in f(-2), this will loop forever, if you plug in f(2), it’ll print “foo”. - (5+2) = 7. Whatever program I write, it will do the same thing, no matter which of these expressions I plug in. Now, with expressions of type ‘integer,’ the resulting partial order is: “Recurses forever” is smaller than everything else. All other values are incomparable. Recall from Antti-Juhani’s post that the bottom element (⊥) is defined to be the smallest element of a mathematical universe (if such a beast exists). Since “recurses forever” is smaller than everything else, it’s the bottom element of this universe. That’s what denotational semantics calls it. (Haskell, too.) Of course, this partial order is kind of boring (the technical term is ‘flat’). It gets more interesting when we consider functions. In general, we can talk about functions from values of type S to values of type T, where we already know the partial orders of S and T. In particular, we can talk about functions from integers to integers. Now, if you can find an x and a program that prints “foo” when you plug in f(x) and “bar” if you plug in g(x), then you can write a program that prints “foo” when you plug in f, and “bar” when you plug in g. (Exercise! 😉 ) Further, denotational semantics wants applying a function to a parameter and then doing something with the result to be the *only* way you can get information about a function. No f.get_function_name, please. (If your programming language has that, you can’t map the functions of your programming language 1:1 to the functions of denotational semantics.) By my rule above, this implies that f <= g if and only if f(x) <= g(x) for all x. Since formal theories can’t have fuzzy stuff like my rule, denotational semantics takes this as the *definition* of the partial order on functions. The bottom element of this mathematical universe, then, is the function that always returns bottom, no matter what it is given — or in other words, a function that recurses forever, no matter what parameter you pass to it. (Side note. Denotational semantics allows only “monotone” functions — that is, if x <= y, then f(x) <= f(y) for all f, x, and y. That’s an example of how all these definitions allow us to make formal statements about the real world. In the real world, if you can find a function f and a program that prints “foo” when given f(x) and “bar” when given f(y), then you have a program that prints “foo” when given x and “bar” when given y — so if f(x) and f(y) are incomparable, then x and y must be incomparable, and the other way around, if x and y are comparable, then f(x) and f(y) must be comparable, too. (This reasoning doesn’t exclude the case that x <= y and f(x) >= f(y). If you really care, prove it yourself. :) ) This ends the side note.)

Recall, now, Antti-Juhani’s definition of recursion: An equation of the form x = some expression (which may contain x), interpreted as a recursive definition, defines x to be the smallest value satisfying this equation. The models constructed in denotational semantics have the property that for any equation of this form, the set of values satisfying it does have a smallest element, so ‘x’ is well-defined no matter what the equation is.

It’s important to note that this does not mean that x has to be “bottom,” because there are equations to which “bottom” is not a solution. Let’s look at a slightly simpler example than the one Antti-Juhani gave.

f(x) = if x = 0 then 0 else f(x-1)

We have f(0) = 1, but bottom(0) = bottom, so f can’t be bottom.

The equation has more than one solution, though. It is satisfied by a function that returns 0 for x >= 0 and “returns bottom” (recurses forever) for x < 0; but it is also satisfied by a function that returns 0 for every input. And by an infinite number of functions that return 0 for some negative inputs and bottom for others. We “want” the first of these solutions, of course, because that’s what we get if we interpret the equation as a computer program, and computer programs are what we’re trying to model. But we’re in luck: The first solution is smaller than all the other ones, because for all inputs, they either return the same value, or the solution we want returns ‘bottom’ and the solution we don’t want returns 0 (and bottom < 0). The point of it all is, of course, that when we interpret an equation as a computer program, we always get the least value satisfying the equation (if we use a partial order that follows my rule, above). And that, my friend, is denotational semantics.

On recursion

Note: This entry has been edited a couple of times (and might be edited in the future) to remove errors. Thanks to Benja Fallenstein for critique.

My previous post about recursion drew some comments. This is a topic that frequently confuses me, and there is a lot of confusion (some of it mine) in the blog post and its comments. On reflection, it seems prudent to examine what recursion really is.

Consider the following recursive definition: 0 is an expression; if T is an expression, then sT is an expression; nothing is an expression that doesn’t have one of these two forms. Now, it is clear that 0 is an expression. It is equally clear that sssssK is not an expression.

In some applications, it makes sense to consider infinite expressions like “…ssssss0“; that is, an infinite number of ss before the final 0, and in others it doesn’t. Is that infinite utterance and expression? It does have one of the forms given in the definitions, so it is not forbidden from being an expression, but neither is there anything in the definition requiring that it is.

It is helpful to do this a bit more formally. Let us translate the self-referential definition into a set equation: The set of expressions is the set E that satisfies the equation

E = { 0 } ∪ { se | e ∈ E }

But this definition is bogus! There are more than one set E that satisfies the equation, as remarked above, and thus there is no “the set E”!

We need to fix the definition in a way that makes the set E unique. The key question is, do we want to regard the infinite utterance discussed above as an expression. It is simplest to just say “the set of expressions is the smallest set E that …” if we want it to not be an expression, and if we want it to be an expression we should say “the set of expressions is the largest set E that …”

Let us make the following definitions:

Consider the set equation S = … S …, in which the set S is alone on the left-hand side and occurs at least once on the right-hand side. This equation is a recursive definition of the smallest set S satisfying this equation, and a corecursive definition of the largest set S satisfying this equation.

Thus, a definition like

E = { 0 } ∪ { se | e ∈ E }

is not recursive because of its form; it is self-referential. By itself, it is an ambiguous and thus bogus definition. We use the words “recursive” and “corecursive” to indicate the intended method of disambiguation.

A rather nice example of this can be seen in the treatment of algebraic data types in strict (ML-style) and nonstrict (Haskell-style) functional programming languages. Consider the declaration (using Haskell syntax)

data IntList = Nil | Cons Int IntList

This can be read as a set equation as follows:

IntList = { Nil } ∪ { Cons n x | n ∈ Int ∧ x ∈ IntList }

Now, as a recursive definition, it is restricted to finite lists as found in ML, but as a corecursive definition, it also allows infinite lists as found in Haskell. (It unfortunately fails to address the question of bottom-terminated lists, which also can be found in Haskell.)

As it happens, the technique is not limited to set definitions; but to generalise it, we need to discuss a bit what we mean by the terms “smallest” and “largest”.

In a set context, a set is smaller than another set if it is the other’s subset. Notice how this makes the ordering partial, that is, there are sets that are not comparable (for example { 1 } and { 2 }; neither is smaller than the other, but neither are they equal). A collection of sets does not necessarily have a “smallest” set (that is, one of the sets that is a subset of all the others). However, we can always find the lowest upper bound (“lub” or “join”) of a collection of sets as the union of the sets, and the greatest lower bound (“glb” or “meet”) as the intersection of the sets, and if the lub or glb is itself a member of the collection, it is the smallest or largest set of the collection.

Haskellists will find things instantly familiar when I say that the smallest element of a mathematical universe (if it exists) is called bottom and is denoted by ⊥; somewhat less familiar will be that the greatest element (if it exists) is called top and is denoted by ⊤.

A partial function is any function that needs not be defined everywhere. For example, f(x) = 1/x is a partial function on the reals (it is not defined for x = 0). We can form a partial ordering of (partial) functions having the same domain and codomain by specifying that f is smaller than g if f(x) = g(x) for all such x that f(x) is defined. Sometimes it will make sense to specify that f is smaller than g if f(x) is smaller than g(x) for all such x that f(x) is defined. Now, it happens that all collections of functions have a lub; and in particular, the partial function that is defined nowhere is the smallest function (the bottom function).

Now, with this definition of “ordering” of functions we can apply the definitions of recursion and corecursion to functions. For example, consider f(x) = f(x) + 1. The bottom function (defined nowhere) qualifies, since undefined + 1 is undefined; thus it is the smallest solution to the equation and thus this equation is a recursive definition of bottom. However, consider f(x) = if x = 0 then 1 else x*f(x-1): any solution to this equation must have f(0) = 1 and thus the bottom function is disqualified. In fact, the smallest solution is the factorial function, and thus the equation is a recursive definition of factorial.

In the comments to my previous entry, I was asked whether the Haskell definition “ones = 1 : ones” is recursive. Of course, the question in itself is nonsensical; whether it is a recursive definition or something else (such as corecursive) depends on what the programmer intends. I will interpret this question as whether the equation is a recursive definition of the intended result, the infinite list of all ones.

First, observe that we can order Haskell lists (including the special “undefined” list) partially as follows: l1 is smaller than l2 if l1 is undefined, if they both are empty lists or if a) the first element of l1 is smaller than the first element of l2 and b) the tail (the list with the first element removed) of l1 is smaller than the tail of l2.

The undefined list does not qualify as a solution for “ones = 1 : ones”. No finite list qualifies either. The infinite list of all ones does qualify, and it is the only solution; hence, the equation is a recursive definition of the infinite list of all ones; it is also a corecursive definition of the same list.

Finally, let us consider the question that started it all: what should we make of the definition “recursion: see recursion”. First, note that a equation x = x admits all elements of the universe as a solution; hence, the proper reading of “recursion: see recursion” as a definition is that it is ambiguous and hence bad. It is a recursive definition of the bottom element of the universe (whatever that might be for a dictionary), and a corecursive definition of the top element of the universe (whatever that might be). In any case, it does not define recursion (unless of course, the concept of recursion is the smallest or the largest concept in the universe of concepts).