Structural insight: no-go theorems

Showing posts with label no-go theorems. Show all posts

Thursday, May 2, 2019

Rewriting and emergent physics

Tempora mutantur, nos et mutamur in illis.
(Times change, and we change with them.)
— Latin Adage, 16t‍h-century Germany.

I want to understand how a certain kind of mathematical system can act as a foundation for a certain kind of physical cosmos. The ultimate goal of course would be to find a physical cosmos that matches the one we're in; but as a first step I'd like to show it's possible to produce certain kinds of basic features that seem prerequisite to any cosmos similar to the one we're in. A demonstration of that much ought, hopefully, to provide a starting point to explore how features of the mathematical system shape features of the emergent cosmos.

The particular kind of system I've been incrementally designing, over a by-now-lengthy series of posts (most recently yonder), is a rewriting system —think λ-calculus— where a "term" (really more of a graph) is a state of the whole spacetime continuum, a vast structure which is rewritten according to some local rewrite rules until it reaches some sort of "stable" state. The primitive elements of this state have two kinds of connections between them, geometry and network; and by some tricky geometry/network interplay I've been struggling with, gravity and the other fundamental forces are supposed to arise, while the laws of quantum physics emerge as an approximation good for subsystems sufficiently tiny compared to the cosmos as a whole. That's what's supposed to happen for physics of the real world, anyway.

To demonstrate the basic viability of the approach, I really need to make two things emerge from the system. The obvious puzzle in all this has been, from the start, how to coax quantum mechanics out of a classically deterministic rewriting system; inability to extract quantum mechanics from classical determinism has been the great stumbling block in devising alternatives to quantum mechanics for about as long as quantum mechanics has been around (harking back to von Neumann's 1932 no-go theorem). I established in a relatively recent post (thar) that the quintessential mathematical feature of quantum mechanics, to be derived, is some sort of wave equation involving signed magnitudes that add (providing a framework in which waves can cancel, so producing interference and other quantum weirdness). The geometry/network decomposition is key for my efforts to do that; not something one would be trying to achieve, evidently, if not for the particular sort of rewriting-based alternative mathematical model I'm trying to apply to the problem; but, contemplating this alternative cosmic structure in the abstract, starting from a welter of interconnected elements, one first has to ask where the geometry — and the network — and the distinction between the two — come from.

Time after time in these posts I set forth, for a given topic, all the background that seems relevant at the moment, sift through it, glean some new ideas, and then set it all aside and move on to another topic, till the earlier topic, developing quietly while the spotlight is elsewhere, becomes fresh again and offers enough to warrant revisiting. It's not a strategy for the impatient, but there is progress, as I notice looking back at some of my posts from a few years ago. The feasibility of the approach hinges on recognizing that its value is not contingent on coming up with some earth-shattering new development (like, say, a fully operational Theory of Everything). One is, of course, always looking for some earth-shattering new development; looking for it is what gives the whole enterprise shape, and one also doesn't want to become one of those historical footnotes who after years of searching brushed past some precious insight and failed to recognize it, so that it had to wait for some other researcher to discover it later. But, as I noted early in this series, the simple act of pointing out alternatives to a prevailing paradigm in (say) physics is beneficial to the whole subject, like tilling soil to aerate it. Science works best with alternatives to choose between; and scientists work best when their thoughts and minds are well-limbered by stretching exercises. For these purposes, in fact, the more alternatives the merrier, so that as a given post is less successful in reaching a focused conclusion it's more likely to compensate in variety of alternatives.

In this series of physics posts, I keep hoping to get down to mathematical brass tacks; but very few posts in the series actually do so (with a recent exception in June of last year). Alas, though the current post does turn its attention more toward mathematical structure, it doesn't actually achieve concrete specifics. Getting to the brass tacks requires first working out where they ought to be put.

Contents
Dramatis personae
Connections
Termination
Duality
Scale

Dramatis personae

A rewriting calculus is defined by its syntax and rewriting rules; for a given computation, one also needs to know the start term. In this case, we'll put off for the moment worrying about the starting configuration for our system.

The syntax defines the shapes of the pieces each state (aka term, graph, configuration) is made of, and how the pieces can fit together. For a λ-like calculus, the pieces of a term would be syntax-tree nodes; the parent/child connections in the tree would be the geometry, and the variable binding/instance connections would be the network. My best guess, thus far, has been that the elementary pieces of the cosmos would be events in spacetime. Connections between events would, according to the general scheme I've been conjecturing, be separated into local connections, defining spacetime, and non-local connections, providing a source of seeming-randomness if the network connections are sufficiently widely distributed over a cosmos sufficiently vast compared to the subsystem under consideration.

I'm guessing that, to really make this seeming-randomness trick work, the cosmos ought to be made up of some truly vast number of events; say, 10⁶⁰, or 10⁸⁰, or on up from there. If the network connections are really more-or-less-uniformly distributed over the whole cosmos, irrespective of the geometry, then there's no obvious reason not to count events that occur, say, within the event horizon of a black hole, and from anywhere/anywhen in spacetime, which could add up to much more than the currently estimated number of particles in the universe. Speculatively (which is the mode all of this is in, of course), if the galaxy-sized phenomena that motivate the dark-matter hypothesis are too big, relative to the cosmos as a whole, for the quantum approximation to work properly —so one would expect these phenomena to sit oddly with our lesser-scale physics— that would seem to suggest that the total size of the cosmos is finite (since in an infinite cosmos, the ratio of the size of a galaxy to the size of the universe would be exactly zero, no different than the ratio for an electron). Although, as an alternative, one might suppose such an effect could derive, in an infinite cosmos, from network connections that aren't distributed altogether uniformly across the cosmos (so that connections with the infinite bulk of things get damped out).

With the sort of size presumed necessary to the properties of interest, I won't be able to get away with the sort of size-based simplifying trick I've gotten away with before, as with a toy cosmos that has only four possible states. We can't expect to run a simulation with program states comparable in size to the cosmos; Moore's law won't stretch that far. For this sort of research I'd expect to have to learn, if not invent, some tools well outside my familiar haunts.

The form of cosmic rewrite rules seems very much up-for-grabs, and I've been modeling guesses on λ-like calculi while trying to stay open to pretty much any outre possibility that might suggest itself. In λ-like rewriting, each rewriting rule has a redex pattern, which is a local geometric shape that must be matched; it occurs, generally, only in the geometry, with no constraints on the network. The redex-pattern may call for the existence of a tangential network connection —the β-rule of λ-calculus does this, calling for a variable binding as part of the pattern— and the tangential connection may be rearranged when applying the rule, just as the local geometry specified by the redex-pattern may be rearranged. Classical λ-calculus, however, obeys hygiene and co-hygiene conditions: hygiene prohibits the rewrite rule from corrupting any part of the network that isn't tangent to the redex-pattern, while co-hygiene prohibits the rewrite rule from corrupting any part of the geometry that isn't within the redex-pattern. Impure variants of λ-calculus violate co-hygiene, but still obey hygiene. The guess I've been exploring is that the rewriting rules of physics are hygienic (and Church-Rosser), and gravity is co-hygienic while the other fundamental forces are non-co-hygienic.

I've lately had in mind that, to produce the right sort of probability distributions, the fluctuations of cosmic rewriting ought to, in essence, compare the different possible behaviors of the subsystem-under-consideration. Akin to numerical solution of a problem in the calculus of variations.

Realizing that the shape of spacetime is going to have to emerge from all this, the question arises —again— of why some connections to an event should be "geometry" while others are "network". The geometry is relatively regular and, one supposes, stable, while the network should be irregular and highly volatile, in fact the seeming-randomness depends on it being irregular and volatile. Conceivably, the redex-patterns are geometric (or mostly geometric) because the engagement of those connections within the redex-patterns cause those connections to be geometric in character (regular, stable), relative to the evolution of the cosmic state.

The overall character of the network is another emergent feature likely worth attention. Network connections in λ-calculus are grouped into variables, sub-nets defined by a binding and its bound instances, in terms of which hygiene is understood. Variables, as an example of network structure, seem built-in rather than emergent; the β-rule of λ-calculus is apparently too wholesale a rewriting to readily foster ubiquitous emergent network structure. Physics, though, seems likely to engage less wholesale rewriting, from which there should also be emergent structure, some sort of lumpiness —macrostructures— such that (at a guess) incremental scrambling of network connections would tend to circulate those connections only within a particular lump/macrostructure. The apparent alternative to such lumpiness would be a degree of uniform distribution that feels, to my intuition anyway, unnatural. One supposes the lumpiness would come into play in the nature of stable states that the system eventually settles into, and perhaps the size and character of the macrostructures would determine at what scale the quantum approximation ceases to hold.

Connections

Clearly, how the connections between nodes —the edges in the graph— are set up is the first thing we need to know, without which we can't imagine anything else concrete about the calculus. Peripheral to that is whether the nodes (or, for that matter, the edges) are decorated, that is, labeled with additional information.

In λ-calculus, the geometric connections are of just three forms, corresponding to the three syntactic forms in the calculus: a variable instance has one parent and no children; a combination node has one parent and two children, operator and operand; and a λ-expression has one parent and one child, the body of the function. For network connections, ordinary λ-calculus has one-to-many connections from each binding to its bound instances. These λ network structures —variables— are correlated with the geometry; the instances of a variable can be arbitrarily scattered through the term, but the binding of the variable, of which there is exactly one, is the sole asymmetry of the variable and gives it an effective singular location in the syntax tree, required to be an ancestor in the tree of all the locations of the instances. Interestingly, in the vau-calculus generalization of λ-calculus, the side-effectful bindings are somewhat less uniquely tied to a fixed location in the syntax tree, but are still one-per-variable and required to be located above all instances.

Physics doesn't obviously lend itself to a tree structure; there's no apparent way for a binding to be "above" its instances, nor apparent support for an asymmetric network structure. Symmetric structures would seem indicated. A conceivable alternative strategy might use time as the "vertical" dimension of a tree-like geometry, though this would seem rather contrary to the loss of absolute time in relativity.

A major spectrum of design choice is the arity of network structures, starting with whether network structures should have fixed arity, or unfixed as in λ-like calculi. Unfixed arity would raise the question of what size the structures would tend to have in a stable state. Macrostructures, "lumps" of structures, are a consideration even with fixed arity.

Termination

In exploring these realms of possible theory, I often look for ways to defer aspects of the theory till later, as a sort of Gordian-knot-cutting (reducing how many intractable questions I have to tackle all at once). I've routinely left unspecified, in such deferral, just what it should mean for the cosmic rewriting system to "settle into a stable state". However, at this point we really have no choice but to confront the question, because our explicit main concern is with mathematical properties of the probability distribution of stable states of the system, and so we can do nothing concrete without pinning down what we mean by stable state.

In physics, one tends to think of stability in terms of asymptotic behavior in a metric space; afaics, exponential stability for linear systems, Lyapunov stability for nonlinear. In rewriting calculi, on the other hand, one generally looks for an irreducible form, a final state from which no further rewriting is possible. One could also imagine some sort of cycle of states that repeat forever, though making that work would require answers to some logistical questions. Stability (cyclic or otherwise) might have to do with constancy of which macrostructure each of an element's network connections associates to.

If rewriting effectively explores the curvature of the action function (per the calculus of variations as mentioned earlier), it isn't immediately obvious how that would then lead to asymptotic stability. At any rate, different notions of stability lead to wildly different mathematical developments of the probability distribution, hence this is a major point to resolve. The choice of stability criterion may depend on recognizing what criterion can be used in some technique that arrives at the right sort of probability distribution.

There's an offbeat idea lately proposed by Tim Palmer called the invariant set postulate. Palmer, so I gather, is a mathematical physicist deeply involved in weather prediction, from which he's drawn some ideas to apply back to fundamental physics. A familiar pattern in nonlinear systems, apparently, is a fractal subset of state space which, under the dynamics of the system, the system tends to converge upon and, if the system state actually comes within the set, remains invariant within. In my rewriting approach these would be the stable states of the cosmos. The invariant set should be itself a metric space of lower dimension than the state space as a whole and (if I'm tracking him) uncomputable. Palmer proposes to postulate the existence of some such invariant subset of the quantum state space of the universe, to which the actual state of the universe is required to belong; and requiring the state of the universe to belong to this invariant set amounts to requiring non-independence between elements of the universe, providing an "out" to cope with no-go theorems such as Bell's theorem or the Kochen–Specker theorem. Palmer notes that while, in the sort of nonlinear systems this idea comes from, the invariant set arises as a consequence of the underlying dynamics of the system, for quantum mechanics he's postulating the invariant set with no underlying dynamics generating it. This seems to be where my approach differs fundamentally from his: I suppose an underlying dynamics, produced by my cosmic rewriting operation, from which one would expect to generate such an invariant set.

Re Bell and, especially, Kochen-Specker, those no-go theorems rule out certain kinds of mutual independence between separate observations under quantum mechanics; but the theorems can be satisfied —"coped with"— by imposing some quite subtle constraints. Such as Palmer's invariant set postulate. It seems possible that Church-Rosser-ness, which tampers with independence constraints between alternative rewrite sequences, may also suffice for the theorems.

Duality

What if we treated the lumpy macrostructures of the universe as if they were primitive elements; would it be possible to then describe the primitive elements of the universe as macrostructures? Some caution is due here for whether this micro/macro duality would belong to the fundamental structure of the cosmos or to an approximation. (Of course, this whole speculative side trip could be a wild goose chase; but, as usual, on one hand it might not be a wild goose chase, and on the other hand wild-goose-chasing can be good exercise.)

Perhaps one could have two coupled sets of elements, each serving as the macrostructures for the other. The coupling between them would be network (i.e., non-geometric), through which presumably each of the two systems would provide the other with quantum-like character. In general the two would have different sorts of primitive elements and different interacting forces (that is, different syntax and rewrite-rules). Though it seems likely the duals would be quite different in general, one might wonder whether in a special case they could sometimes have the same character, in which case one might even ask whether the two could settle into identity, a single system acting as its own macro-dual.

For such dualities to make sense at all, one would first have to work out how the geometry of each of the two systems affects the dynamics of the other system — presumably, manifesting through the network as some sort of probabilistic property. Constructing any simple system of this sort, showing that it can exhibit the sort of quantum-like properties we're looking for, could be a worthwhile proof-of-concept, providing a buoy marker for subsequent explorations.

On the face of it, a basic structural difficulty with this idea is that primitive elements of a cosmic system, if they resemble individual syntax nodes of a λ-calculus term, have a relatively small fixed upper bound on how many macrostructures they can be attached to, whereas a macrostructure may be attached to a vast number of such primitive elements. However, there may be a way around this.

Scale

I've discussed before the phenomenon of quasiparticles, group behaviors in a quantum-mechanical system that appear (up to a point) as if they were elementary units; such eldritch creatures as phonons and holes. Quantum mechanics is fairly tolerant of inventing such beasts; they are overtly approximations of vastly complicated underlying systems. Conventionally "elementary" particles can't readily be analyzed in the same way —as approximations of vastly complicated systems at an even smaller scale— because quantum mechanics is inclined to stop at Planck scale; but I suggested one might achieve a similar effect by importing the complexity through network connections from the very-large-scale cosmos, as if the scale of the universe were wrapping around from the very small to the very large.

We're now suggesting that network connections provide the quantum-like probability distributions, at whatever scale affords these distributions. Moreover, we have this puzzle of imbalance between, ostensibly, small bounded network arity of primitive elements (analogous to nodes in a syntax tree) and large, possibly unbounded, network arity of macrostructures. The prospect arises that perhaps the conventionally "elementary" particles —quarks and their ilk— could be already very large structures, assemblages of very many primitive elements. In the analogy to λ-calculus, a quark would correspond to a subterm, with a great deal of internal structure, rather than to a parse-tree-node with strictly bounded structure. The quark could then have a very large network arity, after all. Quantum behavior would presumably arise from a favorable interaction between the influence of network connections to macrostructures at a very large cosmic scale, and the influence of geometric connections to microstructures at a very small scale. The structural interactions involved ought to be fascinating. It seems likely, on the face of it, that the macrostructures, exhibiting altogether different patterns of network connections than the corresponding microstructures, would also have different sorts of probability distributions, not so much quantum as co-quantum — whatever, exactly, that would turn out to mean.

If quantum mechanics is, then, an approximation arising from an interaction of influences from geometric connections to the very small and network connections to the very large, we would expect the approximation to hold, not at the small end of the range of scales, but only at a subrange of intermediate scales — not too large and at the same time not too small. In studying the dynamics of model rewriting systems, our attention should then be directed to the way these two sorts of connections can interact to reach a balance from which the quantum approximation can emerge.

At a wild, rhyming guess, I'll suggest that the larger a quantum "particle" (i.e., the larger the number of primitive elements within it), the smaller each corresponding macrostructure. Thus, as the quanta get larger, the macrostructures get smaller, heading toward a meeting somewhere in the mid scale — notionally, around the square root of the number of primitive elements in the cosmos — with the quantum approximation breaking down somewhere along the way. Presumably, the approximation also requires that the macrostructures not be too large, hence that the quanta not be too small. Spinning out the speculation, on a logarithmic scale, one might imagine the quantum approximation working tolerably well for, say, about the middle third of the lower half of the scale, with the corresponding macrostructures occupying the middle third of the upper half of the scale. This would put the quantum realm at a scale from the number of cosmic elements raised to the 1/3 power, down to the number of cosmic elements raised to the 1/6 power. For example, if the number of cosmic elements were 10¹²⁰, quantum scale would be from 10⁴⁰ down to 10²⁰ elements. The takeaway lesson here is that, even if those guesses are off by quite a lot, the number of primitive elements in a minimal quantum could still be rather humongous.

Study of the emergence of quasiparticles seems indicated.

Posted by John Shutt at 9:57 PM 2 comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: mathematical physics, no-go theorems, quantum mechanics, spacetime, Theory of Everything

Sunday, May 10, 2015

Computation and truth

a very great deal more truth can become known than can be proven.
— Richard Feynman, "The Development of the Space-Time View of Quantum Electrodynamics", Nobel Lecture, 1965.

In this post I'm going to take a sharper look at the relationship between computation and truth. This relates to my previously blogged interests both in difficulties with the conventional notion of type (here) and in possible approaches to sidestepping Gödel's Theorem (here).

I was motivated to pursue these ideas further now by two recent developments: first, a discussion on LtU that — most gratifyingly — pushed me to re-review the more globally oriented aspects of Gödel's Theorem (here); and second, following on that, an abrupt insight into what the Curry–Howard correspondence implies about the practical dynamics of static typing.

My biggest achievement here is on something I've wanted to do for many years: to make clear exactly why Gödel's Theorem is true. By proving it. As an elementary result. If it's really as profound as it's made out to be, it should have a really simple proof; yet I've never seen the proof treated as an elementary result. At an elementary level one usually sees hand-waving about the proof, in a manner I find unsatisfying because the details omitted seem rather essential to understanding why the result is true. I don't expect to specify here every tiny detail of a proof, but any detail I omit should be obviously minor; it shouldn't feel as if anything problematic is being hidden.

I've comparatively far less to say about Curry–Howard. It's taken me years to come up with this insight into the connection between Curry–Howard and static typing, though; so if the difficulty of coming up with it isn't just me being thick, the fact that it doesn't take long to state is a good sign.

Contents
That which is most computable is most true
Church–Turing
Enumeration
Diagonalization
Uncomputability
Gödel
Attitude
Curry–Howard
What we can't know
Gödel's Second Theorem
Turing's dissertation
Where do we go from here?

That which is most computable is most true

We want to get at truth using mechanical reasoning. Why? Because mechanical reasoning is objective. We can see what all the rules are, and decide whether we agree with them, and we can check how the particular reasoning was done and know whether or not it follows the rules, and if it all checks out we can all agree on the conclusion. Everything is out in the open, unlike things like divine revelation where an unadorned "I don't believe it" is a credible counter-argument.

In mechanical reasoning, propositions are built up using symbols from a finite alphabet, and a finite set of rules of deduction allow propositions to be proven, either from first principles (axioms) or from a finite number of previously proven propositions (theorems). In essence, this is computation (aka mechanical computation); I'm not talking about a sophisticated correspondence here, just common-sense mechanics: manipulating propositions using finite rules of deduction is manipulating strings over an alphabet using a finite set of rules, which is computation. What you can and can't do this way is governed by what you can and can't do by computation.

Some results that came out gradually over the first several decades of the twentieth century appear to say there is no one unique most-powerful notion of truth accessible by mechanical reasoning. However, it turns out there is a unique most-powerful notion of computation. By the time this second point was fully appreciated, mathematicians had already long since been induced to reconcile themselves, one way or another, to the first point. However, looking at it all from nearly a century downstream, it seems advisable to me to start my systematic study with the historically later result, the unique most-powerful notion of computation.

Church–Turing

In the 1930s, three different formal models of general computation were proven to be equi-powerful: general recursive functions (1931, Gödel and Herbrand), λ-calculus (1936, Church), and Turing machines (1936, Turing). That is, for any computation you can do with one of these models, there's an equivalent computation you can do with either of the other models. Of these three models, Turing machines are perhaps the least mathematically elegant, rather nuts-and-bolts sort of devices, but it's also most intuitively obvious that you can do something by mechanical computation iff (if and only if) you can do it with a Turing machine. When the other models were proven equi-powerful with Turing machines, it didn't add much to the credibility of Turing machines; rather, it added credibility to the other models.

The "thesis", also called a "conjecture", is that any model of mechanical computation, if it isn't underpowered, is also equi-powerful with Turing machines and all these other "Turing-powerful" models of computation. It's a "thesis", "conjecture", etc., because it's an inherently informal statement and therefore isn't subject to formal proof. But it's demonstrated its unassailability for about three quarters of a century, now. Mathematicians study what it takes to build a model more powerful than Turing machines (in fact, Turing himself touched on this in his dissertation); you have to bring in something that isn't altogether mechanical. The maximum amount of computational power you can get mechanically is a robust quantity, somehow built into the fabric of Platonic mathematics much as the speed of light is built into the fabric of physics, and this quantity of power is the same no matter which way you get there (what computational model you use). And, that amount of computational power is realizable; it isn't something you can only approach, but something you can achieve (mathematically speaking).

Enumeration

When exploring the limits of Turing-powerful computation, the basic technique is to frame computations in terms of enumerations.

Enumeration is where you just generate a list, perhaps a list that never ends; typically you don't want the whole list, you just watch it as it grows to see whether the information you actually want ever gets listed. As long as you're only asking whether a computation can be done — not how efficiently it can be done — all computations can be rearranged this way to use enumeration.

Suppose you've got any Turing machine. It computes outputs from inputs. Maybe it doesn't even always complete its computation: maybe sometimes it diverges, computing forever instead of finally producing an answer, perhaps by going into an infinite loop, or perhaps finding some more exotic way to non-terminate. But, given any such machine, you can always define a second machine that enumerates all the input/output pairs where the first machine given that input would halt with that output. How would you do that? It's straightforward, though tedious. Call the first machine T₁. We can enumerate all the possible inputs, since they're all built up from a finite alphabet; just list them in order of increasing length, and within inputs of a given length, list them alphabetically. For each of these inputs, using the mechanical rules of T₁, you can enumerate the states of T₁: the initial state, where it has the input and is about to start computing, the second state that results from what it does next from that first state, and so on. Call the m^th state of T₁ on the n^th input S(n,m). Imagine an expanding table, which we slowly fill out, where n is the row number and m is the column number. Row n is the entire computation by T₁ for the n^th input. We can't fill out the whole table by completing each row before moving on to the next, because some rows might extend to the right forever, as T₁ doesn't halt on that input. But we can mechanically compute any particular cell in the table, S(n,m). So we just have to enumerate all the pairs n,m such that we'll eventually get to each of them — for example, we can do all the entries where the sum of n and m is two (that's just the leftmost topmost entry, S(1,1)), then all the ones where the sum is three (that's S(1,2) and S(2,1)), four (that's S(1,3), S(2,2), and S(3,1)), and so on — and whenever we find a cell where T₁ halts, we output the corresponding T₁ input/output pair. We now have a T₂ that enumerates all and only the input/output pairs for which T₁ halts.

Diagonalization

Way back in the late nineteenth century, though, Georg Cantor noticed that if you can enumerate a series of enumerations — even a series of infinite enumerations — you can find an enumeration that wasn't in the series. He did this with real numbers, to show that not all real numbers are rational. This is going to sound very similar to what we just did for Turing machines.

Consider the numbers less than one and not less than zero. We can enumerate the rational numbers in this interval: each such number is a non-negative integer divided by a strictly larger integer, so just list them in order of increasing denominator, and for a given denominator, by increasing numerator (0/1, 0/2, 1/2, 0/3 1/3, 2/3, et cetera ad infinitum). For each of these ratios, we can enumerate the digits in the decimal representation of that ratio, starting with the tenths digit. (If the denominator divides a power of ten, after some point the decimal representation will be all zeros.) Call the m^th digit of the n^th ratio S(n,m). Imagine a table where n is the row number and m is the column number. Row n is the entire decimal representation of the n^th ratio. We can mechanically compute what should go in any particular entry of this table. And now comes the trick. We can construct the decimal representation of a real number, in the interval, that isn't equal to any of the ones we've enumerated. To do this, we read off the entries on the diagonal of our table from the upper left toward the lower right (S(1,1), S(2,2), etc.), and add one to each digit (modulo 10, so if we read a 9 we change that to 0): our n^th digit is (S(n,n) + 1) mod 10. This is a perfectly good way to specify a real number — it's an infinite sum of the form Σa_n10⁻ⁿ — and we know it isn't rational because every rational number in the interval has some decimal digit on which it differs from our real number.

This general technique is called diagonalization: you have an enumerable set of inputs (the labels on the rows), for each input you have an enumerated sequence (the row with that label), and you then produce an enumerated sequence that differs from every row by reading off the diagonal from upper left downward to the right. Since diagonalization is used to show something isn't enumerated, and enumeration is at the heart of computation, naturally diagonalization is useful for showing things can't be computed.

Uncomputability

Because a Turing machine, like any other algorithmic device such as a general recursive function or λ-calculus term, is fundamentally finite, it's a straightforward (if tedious) exercise to describe it as an input to another machine that can then, straightforwardly, interpret the description to simulate the behavior of the described machine. This is a universal Turing machine, which takes as input a machine description and an input to the described machine, and outputs the output of the described machine on that input — or doesn't halt, if the described machine wouldn't halt on that input.

However, although simulating a described machine is straightforward, it is not possible to determine in general, by computation, whether or not the described machine will halt on the given input. That is, we cannot possibly construct a Turing machine that always halts that determines whether any described machine halts on a given input. We can show this by diagonalization.

We can enumerate all possible machine descriptions, readily enough, since they're just alphabetic strings that obey some simple (and therefore checkable) syntactic rules. We can also, of course, enumerate all possible inputs to the described machines. Imagine a table where the entry S(n,m) at row n and column m is "yes" if the n^th machine halts on the m^th input, or "no" if it doesn't halt. Suppose we can construct a machine that computes the entries S(n,m) of this table. Then by going down the diagonal of the table we can also construct a machine whose behavior differs from every row of the table: Let machine A on the m^th input compute S(m,m), and if it's "no", halt and say "no", while if it's "yes", go into an infinite loop and thus never halt. If machine A is the n^th machine in the table, then A halts on the n^th input if and only if S(n,n) is "no". Assuming that our computation of S works right, there can't be any such n, and A was never enumerated.

Gödel

Gödel's Theorem (aka Gödel's First Theorem) says that any sufficiently powerful formal system is either incomplete or inconsistent — in essence, either it can't prove everything that's true, or it can prove things that aren't true.

To pin this down, we first need to work out what "sufficiently powerful" means. Gödel wanted a system powerful enough to reason about arithmetic: we can boil this down to, for an arithmetic function f and integer i, does f(i)=j or doesn't it? The functions of interest are, of course, general recursive functions, which are equi-powerful with Turing machines and with λ-calculus; so we can equivalently say we want to reason about whether a given Turing machine with input i will or will not produce output j. But a formal system is itself essentially a Turing machine; so in effect we're talking about a Turing machine L (the formal system; L for logic) that determines whether or not a Turing machine (the function f) on given input produces given output. The system would be consistent and complete if it confirms every true statement about whether or not f on given input produces given output, and doesn't confirm any false such statements.

Enumerate the machines f and make them the rows of a table. Enumerate the input/output pairs and make them the columns. In the entry for row n and column m, put a "yes" if L confirms that the n^th machine has the m^th input/output pair, a "no" if L confirms that it doesn't. Suppose L is consistent and complete.

It can't be both true and false that the n^th machine has the m^th input/output pair; so if L only confirms true propositions, there can't be both a "yes" and a "no" in any one table entry. What about blank table entries? For centuries it was generally agreed that a proposition must be either true or false; but this idea had fallen into some disrepute during the three decades leading up to Gödel's results. This is just as well, because, based on our supposition that L is consistent and complete, we can easily show that the table must have some blank entries. Suppose the table has no blank entries. Then for any machine f₁, and any input i, we can determine whether f₁ halts on i, thus: construct another machine f₂ that runs f₁ on i and then halts with output confirm. Because there are no blank entries in the table, we know L can determine whether or not f₂(i)=confirm, and this also determines whether or not f₁ halts on i. But we already know from the previous section that we cannot correctly determine by computation whether or not an arbitrary machine halts on an arbitrary input; therefore, there must be some blank entries in the table.

Is it possible for a proposition of this kind — that a given machine on a given input produces a given output — to be neither true nor false? If you think this isn't possible, then we have already proven to your satisfaction that L cannot be both consistent and complete. However, since we're collectively no longer so sure that propositions have to be either true or false, let's see if we can find a difficulty with the consistent complete system without insisting that every table entry must be filled in. Instead, we'll look for a particular entry that we know should be filled in, but isn't.

We're going to diagonalize. First, let's restrict our yes/no table by looking only at columns where the output is confirm (and, being really careful, suppress any duplicate column labels, so each column label occurs only once). So now our table has rows for machines f, columns for inputs i, and each entry (n,m) contains a "yes" if L confirms that the n^th machine on the m^th input produces output confirm, while the entry contains a "no" if L confirms it does not produce output confirm. The entry is blank if L doesn't confirm either proposition. Construct a machine A as follows. For given input, go through the column labels till you find the one that matches it (we were careful there'd be only one); call that column number m. Use L to confirm a "no" in table entry m,m, and once you've confirmed that, output confirm. If L never confirms that "no", then A never halts, and never outputs confirm. Since A is a Turing machine, it is the label on some row n of the table. What is the content of table entry n,n? Remember, the content of the table entry is what L actually confirms about the behavior of A on the n^th input. By construction, if the entry contains "no", then A outputs confirm, and the "no" is incorrect. If the entry contains "yes", and the "yes" is correct, then A outputs confirm, and by construction it must have done so because the entry contains an incorrect "no" that caused A to behave this way. Therefore, if L doesn't confirm anything that's false, this table entry must be blank. But if we know the table entry is blank, then we know that, by failing to put a "no" there, L has failed to confirm something true, and is therefore incomplete.

If we are sure the formal system proves everything that's true, then we cannot possibly be sure it doesn't prove anything that's false; if we are sure it doesn't prove anything that's false, we cannot possibly be sure it proves everything that's true. Heisenberg's uncertainty principle comes to mind.

Attitude

Gödel's results are commonly phrased in terms of what a formal system can prove about itself, and treated in terms of the rules of deduction in the formal system. There are both historical and practical reasons for this.

In the first half of the nineteenth century, the foundations of mathematics underwent a Kuhnian paradigm shift, settling on building things up formally from a set of axioms. In the 1890s people started to notice cracks in the axiomatic foundations, in the form of antinomies — pairs of contradictory statements that were both provable from the axioms. Mathematicians generally reacted by looking for some axiom they'd chosen that doesn't hold in general — as geometry had done in the early nineteenth century to explore non-Euclidean geometries that lack the parallel postulate. As a source of antinomies, attention fell primarily on the Law of the Excluded Middle, which says a proposition is either true or false; as an off-beat alternative, Alonzo Church considered weakening reductio ad absurdum, which says that if assuming a proposition leads to a contradiction, then the proposition is false. Thus, emphasis on choice of rules of deduction.

David Hilbert proposed to use a subset of a formal system to prove the consistency of the larger system; this would have the advantage that one might be more confident of the subset, so that using the subset to prove consistency would increase confidence in the larger system. Gödel's result was understood to mean that the consistency of the whole formal system (for a powerful system) can only be proved in an even more powerful system. Thus, emphasis on what a formal system can prove about itself.

Explorations of how to cope with the Theorem have continued to focus on the system's rules of deduction; my own earlier post tended this way. Alan Turing's dissertation at Princeton also followed this route. The emphasis on rules of deduction naturally suggests itself when looking for a way around Gödel's Theorem, because if you want to achieve a mechanical means for deriving truth, as a practical matter you can't achieve that without working out the specific mechanical means.

However, in this post I've been phrasing and treating Gödel's Theorem differently.

I phrased myself in terms of what we can know about the system — regardless of how we come to know — rather than what the system can prove about itself. (I'm not distinguishing, btw, between "what we can know" and "what can be true"; either would do, in principle, but we're no longer sure what "truth" is, and while it's awkward to talk about multiple notions of truth, it's easier to talk about multiple observers. When convenient I'll conjure a hypothetical omniscient being, to dispense with quibbles about "true but unknowable".)

My treatment of the Theorem conspicuously omits any internals of the formal system, supposing only that its conclusions are computable (and below I'll dispense with even that supposition). By depicting Gödel's Theorem without any reference to the rules of deduction, this approach seems to throw a wet blanket on attempts to cope with Gödel's Theorem by means of one's choice of rules of deduction — and frankly, I approve of discouraging attempts by that route. I'm not looking for a clever loophole in Gödel's result — invoking, say, uncountable infinities or second-order logic as a sort of Get Out of Jail Free card. In my experience, when somebody thinks they've found a loophole in something as fundamental as Gödel's Theorem, it's very likely they've outsmarted themselves and ended up with a bogus result. What I want is an obvious way of completely bypassing the Theorem; something poetically akin to cutting the Gordian Knot.

That is, I'm looking for a way around Gödel's Theorem with a high profundity index. This is an informal device I use to characterize the sort of solutions I favor. Imagine you could use numerical values to describe how difficult conceptual tasks are: each such value is a positive number, and the more difficult the task, the higher the number. Now, for a given idea, take the difficulty of coming up with the idea the first time, and divide by the difficulty of understanding the idea once it's been explained to you. That ratio is the profundity index of the idea. So an idea is profound if it was really difficult to come up with, but is really obvious once explained. If an idea that's incredibly hard to come up with in the first place turns out to be even harder to figure out how to explain clearly, the denominator you want is the difficulty of understanding it after somebody has figured out how to explain it clearly, and the numerator should include the difficulty of coming up with the explanation.

The metaphor of getting around something implies a desire to get to the other side; and it may be illuminating to ask why one wants to do so. We have here two notions, one practical and one philosophical. The notion of truth is as philosophical as you can get; it's the whole purpose of philosophy. The notion of mechanical computation is — despite quibbles about infinite resources and such — quintessentially practical, to do with getting results by an explicitly objective and reproducible procedure. Mathematicians in the second half of the nineteenth century sought to access truth through computation. The protracted collapse of that agenda across the first three decades of the twentieth century, culminating in Gödel's Theorem, has left us without a clear understanding of the proper role of computation in investigating truth; and with yet another in philosophy's long tradition of ways to not be sure what is true. So I suppose, in trying to get around Gödel's Theorem, my hopes are

to find a robust maximum of truth, as Turing power is a robust maximum of computational power.
to find a robust maximum way of obtaining truth through computation.

Gödel's Theorem tells us that Turing power itself, despite its robustness, does not provide a straightforward maximum of either truth or proof.

Curry–Howard

Though there are (of course) situations in which the Curry–Howard correspondence is exactly what one needs, in general I see it as badly overrated.

The basic correspondence is between rules of deduction in formal proof systems, and rules of static typing in a programming language (classically, the typed λ-calculus). The canonical example is that modus ponens corresponds to typed function application: modus ponens says that if proposition A is provable and proposition A⇒B is provable, then proposition B is provable; typed function application says that if a is an expression of type A and f is an expression of type A→B, then fa is an expression of type B. Moving outward from this insight, when you construct a correctly typed program you are also constructing a proof; thus proofs correspond to correctly typed programs. A theorem corresponds to a type, so that asking whether a theorem has a proof is asking whether the corresponding type has a correctly typed expression of that type — that is, provability of the theorem corresponds to realizability of the type. And so on.

The folk-wisdom version of the correspondence is that logic and computation are the same thing. The folk-corollary is that all reasoning should be done with types. This is the basis of modern type theory, and there are folks trying to recast both programming language design, logic, and mathematics in the image of types. Curry–Howard has taken on a (one hopes, metaphorically) theological significance.

It strikes me, though, that the basic correspondence does not involve computation at all. If, in the realm of programming, a type system ever becomes Turing-powerful, that's a major mark against it because we want fast automatic type-checking and, even if we're willing to wait a little longer, we certainly want our type-checks to be guaranteed to halt. In any event, types are not the primary vehicle of computation, rather they're a means of reasoning about the form of programs — thus, not even reasoning directly about our computations, but rather about the way we specify our computations.

It's easy to get tangled up trying to make sense of this proof–program connection. For example, when we say that we want our automatic type-checking algorithm to always halt, that limits the computational power involved in checking an individual step of a proof, but puts no limit on the computational power of proof in general because the length of allowable proofs is unbounded, just as the size of program expressions is unbounded. There is no evident notion of what it means for a proof to "halt", and this corresponds, through Curry–Howard, to saying there is no such thing as a "largest" expression in λ-calculus that cannot be made a subexpression of a larger expression; it has nothing whatever to do with halting of λ-calculus computations. The reason one gets tangled up like this is that although proofs and programs are technically connected through Curry–Howard, they have different and often incompatible purposes.

The purpose of a proof, I submit, is to elucidate a chain of reasoning. The more lucid, the better. Ideally one wants what Paul Erdős used to call "one from The Book" — the idea being that God has a jealously guarded book of the most beautiful proofs of all theorems (Paul Erdős was an atheist; he said "You don't have to believe in God, but you should believe in The Book"). But the first duty of a program is to elucidate an algorithm. Seriously. This shouldn't be a controversial statement, and it's scary to realize that for some people it is. I'll say it again. The first duty of a program is to elucidate an algorithm. You should be able to tell at a glance what the program is doing. That is your first line of defense against getting the program wrong. Proofs of program correctness, with all the benefits and problems thereof, are a later line of defense, possibly useful but no substitute for being able to tell what the program does. (Yes, this is yet another of my dozen-or-so topics I'm trying to draft a blog post about, under the working title "How do we know it's right?".) And this is where the two sides of the Curry–Howard correspondence part company. If you relentlessly drive your program syntax toward more lucid expression of algorithms, you obfuscate the inference rules of your type system, which is to say, the deductive steps of the corresponding proof. If you drive to simplify the type system, you're no longer headed for maximally lucid algorithms. In practice, it seems, you try to get simultaneously the best of both worlds and end up sliding toward the worst of both. (My past blog post on types is yonder.)

What we can't know

Enough type-bashing. Back on the Gödel front we were hoping for robust maxima of truth and of computable truth. What does Gödel's Theorem actually tell us about these goals?

First of all we should recognize that Gödel's Theorem is not, primarily, a limit on what can be known by mechanical means. The uncomputability of the Halting problem (proven above ) is a limit on what can be known by mechanical means. Gödel's Theorem is a limit on what can be known at all. That is, it bears more on truth than it does on truth through computation. I touched on this point above. To further clarify, we can use a notion that played a brief role in Alan Turing's doctoral dissertation at Princeton (supervised by Alonzo Church), called an oracle.

Suppose we attach a peripheral device, O, to a Turing machine. The Turing machine, chugging along mechanically, can at any time ask a question of O, and O will give an answer (based on the question) in one step of the machine. O is a black box; we don't say how it works, and there's no need for it to be mechanical inside. Maybe there's a djinn, or a deity, or some such, inside O that's producing the answer. We're only assuming that it will always immediately answer any question asked of it. That's what we mean by an oracle. We'll call a Turing machine equipped with this device an O-machine.

Let's revisit the Halting problem, and see what we can —and can't— do to change the situation by introducing an oracle. Our basic result, remember, is that there is no purely mechanical means to determine whether or not an arbitrary Turing machine will halt on an arbitrary input. Okay. What if we had a non-mechanical means? Precisely, let us suppose we have an oracle O that will always tell us immediately whether a given ordinary Turing machine will halt on a given input. This supposition doesn't appear to have, in itself, any unfortunate consequences. Under the supposition that we have such an oracle O, we can easily build an O-machine that determines whether a given ordinary Turing machine will halt on a given input. What we can't do, given this oracle O, is build an O-machine that determines whether a given O-machine halts on a given input. This is one of those results that's easier to show if we make it stronger. Without even knowing what an oracle O does, we can prove that there is no O-machine that always halts that determines whether any described O-machine will halt on a given input. This is because we can describe all possible O-machines without ever having to describe the internals of O. O is the same for all of them, after all; we only have to describe the parts of the O-machines that differ from each other, and that we can do finitely no matter what O is. So we can still enumerate our O-machines, and diagonalize just as we did in the earlier section to prove an ordinary Turing machine couldn't solve the ordinary Halting problem. No matter what O is, even if it's got an omniscient being hidden inside it, an O-machine can't solve the O-Halting problem.

Likewise, our diagonalization to prove Gödel's Theorem works just as well for O-machines as for ordinary Turing machines. For any oracle O, let L be an O-machine that confirms some propositions, and fails to confirm others, about whether or not given O-machines on given inputs produce given outputs. Reasoning just as before, if we know that L confirms all true claims, then we cannot know that L doesn't confirm any false claims; if we know it doesn't confirm any false claims, then we cannot know it confirms all true claims. So even if we're allowed to consult an oracle, we still can't achieve an understanding of truth, L, that confirms all and only true claims (about O-machines, lest we forget).

We are now placing limits on what we can know, or equivalently, on what can be true. To get such results we must have started with some constraints, and we've conspicuously placed no constraints at all on O, not even computability. It's worth asking what our constraints were, that led to the result.

For one thing, we have required truth to agree with the reasoning in the diagonalization argument of our proof of the Theorem. Interestingly, this makes clear that the oracle itself is not our notion of truth, for we didn't require the oracle to respect the reasoning in our diagonalization; rather, with no constraints on how the oracle derives its answers from the questions asked of it, we proved a limitation on what those answers could mean. The O-machine L, together with our reasoning about it, became our I Ching , the means by which we mapped the oracle's behavior into the realm of truth.

The reasoning in the diagonalization isn't all we introduced; we also introduced... what? The requirement that our djinn/deity/whatever-it-is feed its answers to a mechanical device. The requirement that it base its answers only on what question the mechanical device asked of it. Ultimately, the requirement that the truth be about discrete inputs and outputs and be expressed as discrete propositions.

The discreteness of the mechanical device makes it possible to enumerate, and therefore diagonalize. The discreteness of the question/answer interactions with the oracle is apparently a corollary to that. The requirement that the oracle base its answer only on the question asked... is thought-provoking, since modern physics leans toward the concept of entanglement. Heisenberg's uncertainty principle already came up once above. One might wonder if truth ought to be reimagined in some way that makes it inherently part of the system in which it exists, rather than something separate; but here, the discrete separations — between machine and oracle, between subject and truth — seem a rather natural complement to the discreteness of the machine.

The common theme of discreteness is reminiscent of the importance attached to "chunking" of information in my earlier post on sapience and language. Moreover, the relationship between discreteness and continuity seems to be cropping up in several of my current major avenues of investigation — Gödel, linguistics, sapience, fundamental physics — and I find myself suspecting we lack some crucial insight into this relationship — hopefully, an insight with a very high profundity index, because the time required to acquire the insight, for the numerator of the index, appears to be measured in millennia, so in order for us to grok it once found we should hope for the denominator to be a lot smaller. However, I've no immediate thoughts in this direction. In the current case, it's more than a little mind-bending to try to construct a notion of truth that's inherently not separable into discrete propositions; indeed, one might wonder if the consequences of the attempt could be rather Lovecraftian. So for the nonce I'll keep looking for profundity around Gödel's results without abandoning the concept of a proposition.

Gödel's Second Theorem

Gödel's Second Theorem says that if a sufficiently powerful formal system L can prove itself consistent, then L is not consistent.

Where Gödel's First Theorem was, as I interpret it, mainly about the relationship between L and truth, his Second Theorem is much more concerned with the specific behavior of L. I covered the "sufficiently powerful" clause in the First Theorem mostly by completeness, i.e., by requiring that L confirm everything we know is true. The Second Theorem isn't about completeness, though, so we need something else. Just as we have already required truth to agree with our diagonalization argument, we'll now separately require L to agree with the diagonalization argument too. And, we don't want to require any controversial behaviors from L, like the Law of the Excluded Middle or reductio ad absurdum, since they would compromise the generality of our result. Here's a set of relatively tame specific behaviors. Write [T:i⇒o] and [T:i⇏o] for the canonical representations, recognized by L, of the propositions that machine T on input i does or does not produce output o.

[a] If T(i)=o, then L confirms [T:i⇒o].
There is a canonical way, recognizable by L, to set up a machine (S∧T) that combines two other machines S and T, by running each of them on its input, requiring them both to produce the same output, and producing that output itself. We require L to do some reasoning about these machines:
- [b] L confirms [(T∧T):i⇏o] iff it confirms [T:i⇏o].
- [c] If L confirms [(S∧T):i⇏o], then it confirms [(T∧S):i⇏o].
There is a canonical way, recognizable by L, to set up a machine (S∘T) that combines two other machines S and T, by running T on its input, then running S on the output from T, and producing the output from S. We require L to do some reasoning about these machines:
- [d] If T(i)=v, then L confirms [(S∘T):i⇏o] iff it confirms [S:v⇏o].
- [e] If S₁(i)=S₂(i)=v, then L confirms [((R∘S₁)∧T):i⇏o] iff it confirms [((R∘S₂)∧T):i⇏o].
- [f] L confirms [(((Q∘R)∘S)∧T):i⇏o] iff it confirms [((Q∘(R∘S))∧T):i⇏o].
- [g] L confirms [((S₁∘T)∧(S₂∘T)):i⇏o] iff it confirms [((S₁∧S₂)∘T):i⇏o].
There is a canonical way, recognizable by L, to set up a machine (T⇒o) that combines a machine T with an output o, by constructing a proposition that T on the given input produces output o. That is, (T⇒o)(i)=[T:i⇒o]. We require L to do some reasoning about these machines:
- [h] L confirms [(S∧T):i⇏confirm] iff it confirms [((L∘(S⇒confirm))∧T):i⇏confirm].

For us, several of these deductions would likely be special cases of more general rules, and maybe they are for L; we only require that L reach these conclusions by some means.

In order for L to prove its own consistency, we need to define consistency as an internal, checkable feature of L's behavior — rather than by the external feature that L says nothing false according to the external notion of truth. Leading candidates are

L does not confirm any antinomy; that is, there is no proposition such that L confirms both it and its negation.
There is some proposition that L does not confirm.

The second of these was widely favored in the early twentieth century: inconsistency means confirming all propositions whatever. Given reductio ad absurdum and the Law of the Excluded Middle, the first implies the second, thus: Starting with a known proof of an antinomy, consider any proposition q. Supposing not-q, we can derive an antinomy (because we could have derived it even if we hadn't supposed not-q), therefore, by reductio ad absurdum, the supposition must be false; that is, not-q is false. But then, by the Law of the Excluded Middle, since q definitely isn't false, it must be true. So every proposition can be proven.

I'll use the first of these two internal notions of consistency, because it is the weaker of the two, so that any theorem we derive from it will be a stronger result. Construct machines T_yes and T_no that, given a machine and an input/output pair, output propositions asserting the machine does and does not have that input/output: T_yes(T,i,o)=[T:i⇒o], T_no(T,i,o)=[T:i⇏o]. As self-proof of consistency, require that L confirm, for every possible input i, proposition [((L∘T_yes)∧(L∘T_no)):i⇏confirm].

Start out just as in the diagonalization for the First Theorem. Imagine a table where the column labels are all possible inputs, the row labels are all possible machines, and each entry contains a "yes" if L confirms that machine on that input outputs confirm, a "no" if L confirms that machine on that input doesn't output confirm. Construct a machine A as follows. First, construct A₀ that, on the m^th input, outputs the machine/input/output needed to construct a proposition that the m^th machine on the m^th input does or does not produce output confirm. (Remember, A₀ can do this by counting the column labels till it finds the input, then counting the row labels to find the corresponding machine.) Then just let A=(L∘(T_no∘A₀)). Let n be the row number of the row labeled by A. We're interested in the behavior of A on the n^th input; but this time, instead of just considering what is true about this behavior, we're interested in what L confirms about it.

Call the n^th input i. We've got all our dominoes lined up; now watch them fall.

By construction of A₀, A₀(i)=(A,i,confirm).

By construction of T_yes, T_yes(A,i,confirm)=[A:i⇒confirm].

By construction of T_no, T_no(A,i,confirm)=[A:i⇏confirm].

By consistency self-proof, L confirms [((L∘T_yes)∧(L∘T_no)):(A,i,confirm)⇏confirm].

By [d], L confirms [(((L∘T_yes)∧(L∘T_no))∘A₀):i⇏confirm].

By [g], L confirms [(((L∘T_yes)∘A₀)∧((L∘T_no)∘A₀)):i⇏confirm].

By _[f] and [c], L confirms [((L∘(T_yes∘A₀))∧(L∘(T_no∘A₀))):i⇏confirm].

By definition of A, L confirms [((L∘(T_yes∘A₀))∧A):i⇏confirm].

By [e], L confirms [((L∘(A⇒confirm))∧A):i⇏confirm].

By [h], L confirms [(A∧A):i⇏confirm].

By [b], L confirms [A:i⇏confirm].

By [a] and definition of A, L confirms [A:i⇒confirm].

For a bonus, by [a], L also proves itself inconsistent by confirming [((L∘T_yes)∧(L∘T_no)):i⇒confirm].

I don't think this result has much to do with the specific assumptions [b]–[h] about the behavior of L; going through the proof leaves me impressed by how relatively innocuous those assumptions were. Which, to my mind, is an insight well worth the exercise of going through the proof.

Turing's dissertation

I've mentioned Turing's American doctoral dissertation several times. (American because he already had sufficient academic credentials in Europe.)

Since Gödel had shown a formal system can't prove itself consistent, it was then of interest to ask how much more than a given formal system would be needed to prove it consistent. Gerhard Gentzen produced some interesting results of this sort, exploring the formal consequences of postulating restricted forms of mathematical induction (before he was arrested by the Germans late in World War II, transferred to the custody of the Soviets, and starved to death in a Soviet POW camp). Turing's dissertation explored another approach: when considering a formal system L, simply construct a new system L₁ that adds to L a postulate saying L is consistent. Naturally you can't add that postulate to L since it would (presuming sufficient power) cause L to become inconsistent if wasn't already; but if L actually was consistent to start with, L₁ should be consistent too since L₁ can't prove its own consistency, only that of L. To prove the consistency of L₁, you can construct an L₂ that adds to L₁ a postulate saying L₁ is consistent. And so on. In fact, Turing didn't even stop with L_k for every positive integer k; he supposed a family of formal systems L_a for any a representing an ordinal. Ordinals are (broadly) the sizes of well-ordered sets; there are infinitely many countably infinite ordinals; there are uncountably infinite ordinals. The only restriction for Turing was that since all this was being done with formal systems, the ordinals α had to be finitely represented.

Turing made the — imho, eminently reasonable — suggestion that the value of this technique lies in explicitly identifying where additional assumptions are being introduced. The additional assumptions themselves, consistency of all the individual ordinal logics in the system, are justified only by fiat, so nothing has been apparently contributed toward reaching truth through computation; the axioms weren't reached through computation, merely acknowledged in computation. Interestingly, nothing has been contributed toward getting around Gödel's Theorems, either: in terms of the framework set up in this post, the entire system of logics can be handled by a single formal system that is itself still subject to the Theorems.

I set out to simplify the situation, pruning unnecessary complications in order to better understand the essence of Gödel's results. It seems an encouraging sign for the success of that agenda, that Turing's authentically interesting but elaborate dissertation has little to say about my treatment — since this means the complexities addressed by his approach have been pruned.

Where do we go from here?

Types. The above suggests correctness proof should be separated from detailed program syntax. As I've remarked elsewhere, it should be possible to maintain propositions and theorems as first-class objects, so that proof becomes the computational act of constructing a theorem object — if one can determine the appropriate rules of deduction, which makes this concern dependent on choosing the appropriate notion of proof.

Truth. Gödel's First Theorem, as treated here, says that truth about the behavior of a formal logic must conform to a certain constraint on the shape of that truth. The impression I've picked up, in constructing this post, is that this constraint should not be particularly troubling. A Euclidean triangle can't have more than one obtuse interior angle; and the truth about a formal logic can't include both completeness and consistency. I've been progressively mellowing about this since the moment I started thinking of a formal system as a filter on truth — an insight preserved above at the comparison to the I Ching.

Proof. Gödel's Second Theorem, as treated here, not only places a notably small constraint on proof (as the First Theorem does on truth), but does so in a way notably disengaged from controversial rules of deduction. The insight I take from this is that, in seeking a robust notion of proof, Gödel's Theorems aren't all that important. The great paradigm crisis of the early twentieth century (in the admittedly rather specialized area of mathematical foundations) was about antinomies implied by the axioms, not self-proofs of consistency. The self-proofs of consistency were just an envisioned possible countermeasure, and when Gödel showed the countermeasure couldn't work, the form of his Second Theorem, together with the lines of research people had already been pursuing, led them to the red herring of an infinite hierarchy of successively more powerful systems each able to show the consistency of those below it. Whatever interesting results may accompany the infinite hierarchy, I now suspect they have nothing much to say about a robust maximum of proof. Turing's ordinal logics — in essence an explicit manifestation of the infinite hierarchy — were, remember, about assuming consistency, not establishing it.

So it seems to me the problem of interest — even in my earlier post on bypassing no-go theorems, where Gödel's results served as the rallying point for my explorations — is how to fix the axioms. In that regard, I have a thought, which clearly calls for a separate post, but I'll offer it here to mull over.

I speculated in the no-go theorems post that where the proof of Russell's Paradox ends after a few steps, leaving a foundational problem, an analogous recursive predicate in Lisp simply fails to terminate, which might or might not be considered a bug in the program but doesn't have the broad foundational import of the Paradox. If a person reasoned about whether or not the set A of all sets that do not contain themselves contains itself, they might say, "If A does not contain itself, then by definition it does contain itself; but then, since it does contain itself, by definition it does not contain itself; but then, since it does not contain itself, by definition it does contain itself..." — and the person saying this quickly sees that the reasoning is going 'round in circles. This is much like

($define! A ($lambda (P) (not? (P P)))) (A A)

except that the Lisp interpreter running this code probably doesn't deduce, short of a memory fault or the like, that the computation won't halt, whereas the human quickly deduces the non-halting. The formal proof, however, does not go 'round in circles. It says, "Suppose A does not contain itself. By definition, since A does not contain itself, A does contain itself. That's a contradiction, therefore by reductio ad absurdum, A does not not contain itself. By the Law of the Excluded Middle, since A does not not contain itself, A does contain itself. By definition, since A does contain itself, A does not contain itself. That's an antinomy." Why did this not go 'round in circles? Because the initial premise, "Suppose A does not contain itself", was discharged when reductio ad absurdum was applied. The human reasoner, though, never forgot about the initial assumption.

This isn't quite the same as a conditional proof, in which you start with an assumption P, show that this assumption leads to a consequence Q, and then conclude from this that P⇒Q regardless of whether or not P holds; the assumption has been discharged, but it lingers embedded in the conclusion. It only appears to have discharged the assumption because we have a notion of implication that neatly absorbs the memory of our having assumed P in order to prove Q. Really, all proofs are conditional in the sense that they only hold if we accept the rules of deduction by which we reached them; but we can't absorb that into our propositions using logical implication. We could still preface all of our propositions with "suppose that such-and-such laws of deduction are valid"; but when writing it formally we'd need a different notation because it's not the same thing as logical implication. We might be tempted to use something akin to a turnstile, which is (in my experience) strictly a meta-level notation — but in the case of reductio ad absurdum, it seems we don't want a meta-level notation. We want to qualify our propositions by the memory of other propositions we've rejected because they led to contradictions.

I'd expect to explore what this does to paradoxes (not just Russell's); I'm skeptical that in itself it would eliminate all the classical paradoxes. I do rather like the idea that the classical paradoxes are all caused by a single flaw, but I suspect the single flaw isn't in a single axiom; rather, it may be a single systemic flaw. It seems that, in some sense, not discharging the assumption in reductio ad absurdum is a way to detect impredicativity, the same source of paradoxes type theory was meant to eliminate.

I look forward to exploring that. Along with my dozen-or-so other draft post topics.

Posted by John Shutt at 8:20 AM 6 comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Church-Turing, computation, Curry-Howard, Godel's Theorem, logic, no-go theorems

Thursday, July 18, 2013

Bypassing no-go theorems

This is not at all what I had in mind.
— Albert Einstein, in response to David Bohm's hidden variable theory.

A no-go theorem is a formal proof that a certain kind of theory cannot work. (The term no-go theorem seems to be used mainly in physics; I find it useful in a more general context.)

A valid no-go theorem identifies a hopeless avenue of research; but in some cases, it also identifies a potentially valuable avenue for research. This is because in some cases, the no-go theorem is commonly understood more broadly than its actual technical result. Hence the no-go theorem is actually showing that some specific tactic doesn't work, but is interpreted to mean that some broad strategy doesn't work. So when you see a no-go theorem that's being given a very broad interpretation, you may do well to ask whether there is, after all, a way to get around the theorem, by achieving what the theorem is informally understood to preclude without doing what the theorem formally precludes.

In this post, I'm going to look at four no-go theorems with broad informal interpretations. Two of them are in physics; I touch on them briefly here, as examples of the pattern (having explored them in more detail in an earlier post). A third is in programming-language semantics, where I found myself with a result that bypassed a no-go theorem of Mitchell Wand. And the fourth is a no-go theorem in logic that I don't actually know quite how to bypass... or even whether it can be bypassed... yet... but I've got some ideas where to look, and it's good fun to have a go at it: Gödel's Theorem.

John von Neuman's no-go theorem

In 1932, John von Neumann proved that no hidden variable theory can make all exactly the same predictions as quantum mechanics (QM): all hidden variable theories are experimentally distinguishable from QM. In 1952, David Bohm published a hidden variable theory experimentally indistinguishable from QM.

How did Bohm bypass von Neumann's no-go theorem? Simple, really. (If bypassing a no-go theorem is possible at all, it's likely to be very simple once you see how). The no-go theorem assumed that the hidden variable theory would be local; that is, that under the theory, the effect of an event in spacetime cannot propagate across spacetime faster than the speed of light. This was, indeed, a property Einstein wanted out of a hidden variable theory: no "spooky action at a distance". But Bohm's hidden variable theory involved a quantum potential field that obeys Schrödinger's Equation — trivially adopting the mathematical infrastructure of quantum mechanics, spooky-action-at-a-distance and all, yet doing it in a way that gave each particle its own unobservable precise position and momentum. Einstein remarked, "This is not at all what I had in mind."

John Stewart Bell's no-go theorem

In 1964, John Stewart Bell published a proof that all local hidden variable theories are experimentally distinguishable from QM. For, of course, suitable definition of "local hidden variable theory". Bell's result can be bypassed by formulating a hidden variable theory in which signals can propagate backwards in time — an approach advocated by the so-called transactional interpretation of QM, and which, as noted in my earlier post on metaclassical physics, admits the possibility of a theory that is still "local" with respect to a fifth dimension of meta-time.

Mitchell Wand's no-go theorem

In 1998, Mitchell Wand published a paper The Theory of Fexprs is Trivial.

The obvious interpretation of the title of the paper is that if you include fexprs in your programming language, the theory of the language will be trivial. When the paper first came out, I had recently hit on my key insight about how to handle fexprs, around which the Kernel programming language would grow, so naturally I scrutinized Wand's paper very closely to be sure it didn't represent a fundamental threat to what I was doing. It didn't. I might put Wand's central result this way: If a programming language has reflection that makes all computational states observable in the syntactic theory of the language, and if computational states are in one-to-one correspondence with syntactic forms, then the syntactic theory of the language is trivial. This isn't a problem for Kernel because neither of these conditions holds: not all computational states are observable, and computational states are not in one-to-one correspondence with syntactic forms. I could make a case, in fact, that in S-expression Lisp, input syntax represents only data: computational states cannot be represented using input syntax at all, which means both that the syntactic theory of the language is already trivial on conceptual grounds, and also that the theory of fexprs is not syntactic.

At the time I started writing my dissertation, the best explanation I'd devised for why my theory was nontrivial despite Wand was that Wand did not distinguish between Lisp evaluation and calculus term rewriting, whereas for me Lisp evaluation was only one of several kinds of term rewriting. Quotation, or fexprs, can prevent an operand from being evaluated; but trivialization results from a context preventing its subterm from being rewritten. It's quite possible to prevent operand evaluation without trivializing the theory, provided evaluation is a specific kind of rewriting (requiring, in technical parlance, a redex that includes some context surrounding the evaluated term).

Despite myself, though, I was heavily influenced by Wand's result and by the long tradition in which it followed. Fexprs had been rejected circa 1980 as a Lisp paradigm, in favor of macros. A rejected paradigm is usually ridiculed in order to rally followers more strongly behind the new paradigm (see here). My pursuit of $vau as a dissertation topic involved a years-long process of gradually ratcheting up expectations. At first, I didn't think it would be worth formulating a vau-calculus at all, because of course it wouldn't be well-enough behaved to justify the formulation. Then I thought, well, an operational semantics for an elementary subset of Kernel would be worth writing. Then I studied Plotkin's and Felleisen's work, and realized I could provide a semantics for Kernel that would meet Plotkin's well-behavedness criteria, rather than the slightly weakened criteria Felleisen had used for his side-effectful calculus. And then came the shock. When I realized that the vau-calculus I'd come up with, besides being essentially as well-behaved as Plotkin's call-by-value lambda-calculus, was actually (up to isomorphism) a conservative extension of call-by-value lambda-calculus. In other words, my theory of fexprs consisted of the entire theory of call-by-value lambda-calculus plus additional theorems.

I was boggled. And I was naively excited. I figured, I'd better get this result published, quick, before somebody else notices it and publishes it first — because it's so incredibly obvious, it can't be long before someone else does find it. Did I say "naively"? That's an understatement. There's some advice for prospective graduate students floating around, which for some reason I associate with Richard Feynman (though I could easily be wrong about that [note: a reader points out this]), to the effect that you shouldn't be afraid people will steal your ideas when you share them, because if your ideas are any good you'll have trouble getting anyone to listen to them at all. In studying this stuff for years on end I had gone so far down untrodden paths that I was seeing things from a drastically unconventional angle, and if even so I had only just come around a corner to where I could see this thing, others were nowhere close to any vantage from which they could see it.

[note: I've since written a post elaborating on this, Explicit evaluation.]

Kurt Gödel's no-go theorem

Likely the single most famous no-go theorem around is Gödel's Theorem. (Actually, it's Gödel's Theorems, plural, but the common informal understanding of the result doesn't require this distinction — and Gödel's result lends itself spectacularly to informal generalization.) This is what I'm going to spend most of this post on, because, well, it's jolly good fun (recalling the remark attributed to Abraham Lincoln: People who like this sort of thing will find it just the sort of thing they like).

The backstory to Gödel was that in the early-to-mid nineteenth century, mathematics had gotten itself a shiny new foundation in the form of a formal axiomatic approach. And through the second half of the nineteenth century mathematicians expanded on this idea. Until, as the nineteenth century gave way to the twentieth, they started to uncover paradoxes implied by their sets of axioms.

A perennial favorite (perhaps because it's easily explained) is Russell's Paradox. Let A be the set of all sets that do not contain themselves. Does A contain itself? Intuitively, one can see at once that if A contains itself, then by its definition it does not contain itself; and if it does not contain itself, then by its definition it does contain itself. The paradox mattered for mathematicians, though, for how it arose from their logical axioms, so we'll be a bit more precise here. The two key axioms involved are reductio ad absurdum and the Law of the Excluded Middle.

Reductio ad absurdum says that if you suppose a proposition P, and under this supposition you are able to derive a contradiction, then not-P. Supposing A contains itself, we derive a contradiction, therefore A does not contain itself. Supposing A does not contain itself, we derive a contradiction, therefore —careful!— A does not not contain itself. This is where the Law of the Excluded Middle comes in: A either does or does not contain itself, therefore since it does not not contain itself, it does contain itself. We have therefore an antinomy, that is, we've proved both a proposition P and its negation not-P (A both does and does not contain itself). And antinomies are really bad news, because according to these axioms we've already named, if there is some proposition P for which you can prove both P and not-P, then you can prove every proposition, no matter what it is. Like this: Take any proposition Q. Suppose not-Q; then P and not-P, which is a contradiction, therefore by reductio ad absurdum, not-not-Q, and by the Law of the Excluded Middle, Q.

When Russell's Paradox was published, the shiny new axiomatic foundations of mathematics were still less than a human lifetime old. Mathematicians started trying to figure out where things had gone wrong. The axioms of classical logic were evidently inconsistent, leading to antinomies, and the Law of the Excluded Middle was identified as a problem.

One approach to the problem, proposed by David Hilbert, was to back off to a smaller set of axioms that were manifestly consistent, then use that smaller set of axioms to prove that a somewhat larger set of axioms was consistent. Although clearly the whole of classical logic was inconsistent, Hilbert hoped to salvage as much of it as he could. This plan to use a smaller set of axioms to bootstrap consistency of a larger set of axioms was called Hilbert's program, and I'm remarking it because we'll have occasion to come back to it later.

Unfortunately, in 1931 Kurt Gödel proved a no-go theorem for Hilbert's program: that for any reasonably powerful system of formal logic, if the logic is consistent, then it cannot prove the consistency of its own axioms, let alone its own axioms plus some more on the side. The proof ran something like this: For any sufficiently powerful formal logic M, one can construct a proposition A of M that amounts to "this proposition is unprovable". If A were provable, that would prove that A is false, an antinomy; if not-A were provable, that would prove that A is true, again an antinomy; so M can only be consistent if both A and not-A are unprovable. But if M were able to prove its own consistency, that would prove that A is unprovable (because A must be unprovable in order for M to be consistent), which would prove that A is true, producing an antinomy, and M would in fact be inconsistent. Run by that again: If M can prove its own consistency, then M is in fact inconsistent.

Typically, on completion of a scientific paradigm shift, the questions that caused the shift cease to be treated as viable questions by new researchers; research on those questions tapers off rapidly, pushed forward only by people who were already engaged by those questions at the time of the shift. So it was with Gödel's results. Later generations mostly treated them as the final word on the foundations of mathematics: don't even bother, we know it's impossible. That was pretty much the consensus view when I began studying this stuff in the 1980s, and it's still pretty much the consensus view today.

Going there

Having been trained to think of Gödel's Theorem as a force of nature, I nevertheless found myself studying it more seriously when writing the theoretical background material for my dissertation. I found myself discoursing at length on the relationship between mathematics, logic, and computation, and a curious discrepancy caught my eye. Consider the following Lisp predicate.

($define! A ($lambda (P) (not? (P P))))

Predicate A takes one argument, P, which is expected to be a predicate of one argument, and returns the negation of what P would return when passed to itself. This is a direct Lisp translation of Russell's Paradox. What happens when A is passed itself?

The answer is, when A is passed itself, (A A), nothing interesting happens — which is really very interesting. The predicate attempts to recurse forever, never terminating, and in theory it will eventually fill up all available memory with a stack of pending continuations, and terminate with an error. What it won't do is cause mathematicians to despair of finding a solid foundation for their subject. If asking whether set A contains itself is so troublesome, why is applying predicate A to itself just a practical limit on how predicate A should be used?

That question came from my dissertation. Meanwhile, another question came from the other major document I was developing, the R-1RK. I wanted to devise a uniquely Lisp-ish variant of the concept of eager type-checking. It seemed obvious to me that there should not be a fixed set of rules of type inference built into the language; that lacks generality, and is not extensible. So my idea was this: In keeping with the philosophy that everything should be first-class, let theorems about the program be an encapsulated type of first-class objects. And carefully design the constructors for this theorem type so that you can't construct the object unless it's provable. In effect, the constructors are the axioms of the logic. Modus ponens, say, is a constructor: given a theorem P and a theorem P-implies-Q, the modus-ponens constructor allows you to construct a theorem Q. As desired, there is no built-in inference engine: the programmer takes ultimate responsibility for figuring out how to prove things.

Of the many questions in how to design such a first-class theorem type, one of the early ones has to be, what system of axioms should we use? Clearly not classical logic, because we know that would give us an inconsistent system. This though was pretty discouraging, because it seemed I would find myself directly confronting in my design the very sort of problems that Gödel's Theorem says are ultimately unsolvable.

But then I had a whimsical thought; the sort of thing that seems at once not-impossible and yet such a long shot that one can just relax and enjoy exploring it without feeling under pressure to produce a result in any particular timeframe (and yet, I have moved my thinking forward on this over the years, which keeps it interesting). What if we could find a way to take advantage of the fact that our logic is embedded in a computational system, by somehow bleeding off the paradoxes into mere nontermination? So that they produce the anticlimax of functions that don't terminate instead of the existential angst of inconsistent mathematical foundations?

Fragments

At this point, my coherent vision dissolves into fragments of tentative insight, stray puzzle pieces I'm still pushing around hoping to fit together.

One fragment: Alfred Tarski —who fits the aforementioned profile of someone already engaged by the questions when Gödel's results came out— suggested post-Gödel that the notion of consistency should be derived from common sense. Hilbert's program had actually pursued a formal definition of consistency, as the property that not all propositions are provable; this does have a certain practicality to it, in that the practical difficulty with the classical antinomies was that they allowed all propositions Q to be proven, so that "Q can be proven" ceased to be a informative statement about Q. Tarski, though, remarked that when a non-mathematician is told that both P and not-P are true, they can see that something is wrong without having to first receive a lecture on the formal consequences of antinomies in interaction with reductio ad absurdum.

So, how about we resort to some common sense, here? A common-sensical description of Russell's Paradox might go like this:

A is the set of all sets that do not contain themselves. If A contains itself, then it does not contain itself. But if it does not contain itself, then it does contain itself. But if it does contain itself, then it does not contain itself. But if it does not contain itself...

And that is just what we want to happen: instead of deriving an antinomy, the reasoning just regresses infinitely. A human being can see very quickly that this is going nowhere, and doesn't bother to iterate beyond the first four sentences at most (and once they've learned the pattern, next time they'll probably stop after even fewer sentences), but they don't come out of the experience believing that A both does and does not belong to itself; they come out believing that there's no way of resolving the question.

So perhaps we should be asking how to make the conflict here do an infinite regress instead of producing a (common-sensically wrong) answer after a finite number of steps. This seems to be some sort of deep structural change to how logical reasoning would work, possibly not even a modification of the axioms at all but rather of how they are used. If it does involve tampering with an axiom, the axiom tampered with might well be reductio ad absurdum rather than the Law of the Excluded Middle.

This idea — tampering with reductio ad absurdum rather than the Law of the Excluded Middle — strikes a rather intriguing historical chord. Because, you see, one of the mathematicians notably pursuing Hilbert's program pre-Gödel did try to eliminate the classical antinomies by leaving intact the Law of the Excluded Middle and instead modifying reductio ad aburdum. His name was Alonzo Church —you may have heard of him— and the logic he produced had, in retrospect, a curiously computational flavor to it. While he was at it, he took the opportunity to simplify the treatment of variables in his logic, by having only a single structure that binds variables, which (for reasons lost in history) he chose to call λ. Universal and existential quantifiers in his logic were higher-order functions, which didn't themselves bind variables but instead operated on functions that did the binding for them. Quite a clever device, this λ.

Unfortunately, it didn't take many years after Church's publication to show that antinomies arose in his system after all. Following the natural reflex of Hilbert's program, Church tried to find a subset of his logical axioms that could be proven consistent — and succeeded. It turned out that if you left out all the operators except λ you could prove that each proposition P was only equivalent to at most one irreducible form. This result was published in 1936 by Church and one of his students, J. Barkley Rosser, and today is known as the Church–Rosser Theorem (you may have heard of that, too). In the long run, Church's logic is an obscure historical footnote, while its λ-only subset turned out to be of great interest for computation, and is well-known under the name "λ-calculus".

So evidently this idea of tampering with reductio ad absurdum and bringing computation into the mix is not exactly unprecedented. Is it possible that there is something there that Alonzo Church didn't notice? I'd have to say, "yes". Alonzo Church is one of those people who (like Albert Einstein, you'll recall he came up in relation to the first no-go theorem I discussed) in retrospect appears to have set a standard of intelligence none of us can possibly aspire to — but all such people are limited by the time they live in. Einstein died years before Bell's Theorem was published. Heck, Aristotle was clearly a smart guy too, and just think of everything he missed through the inconvenience of being born about two millennia before the scientific revolution. And Alonzo Church couldn't, by the nature of the beast, have created his logic based on a modern perspective on computation and logic since it was in part the further development of his own work over many decades that has produced that modern perspective.

I've got one more puzzle piece I'm pushing around, that seems like it ought to fit in somewhere. Remember I said Church's logic was shown to have antinomies? Well, at the time the antinomy derivation was rather baroque. It involved a form of the Richard Paradox, which concerns the use of an expression in some class to designate an object that by definition cannot be designated by expressions of that class. (A version due to G.G. Berry concerns the twenty-one syllable English expression "the least natural number not nameable in fewer than twenty-two syllables".) The Richard Paradox is naturally facilitated by granting first-class status to functions, as λ-calculus and Lisp do. But, it turns out, there is a much simpler paradox contained in Church's logic, involving less logical machinery and therefore better suited for understanding what goes wrong when λ-calculus is embedded in a logic. This is Curry's Paradox.

I'll assume, for this last bit, that you're at least hazily familiar with λ-calculus, so it'll come back to you when you see it.

For Curry's Paradox, we need one logical operator, three logical axioms, and the machinery of λ-calculus itself. Our one logical operator is the binary implication operator, ⇒. The syntax of the augmented λ-calculus is then

T ::= x | c | (λx.T) | (TT) | (T⇒T)

That is, a term is either a variable, or a constant, or a lambda-expression, or an application, or an implication. We don't need a negation operator, because we're sticking with the generalized notion of inconsistency as the property that all propositions are provable. Our axioms assert that certain terms are provable:

For all terms P and Q, if provably P and provably (P⇒Q), then provably Q. (modus ponens)
For all terms P, provably P⇒P.
For all terms P and Q, provably ((P⇒(P⇒Q))⇒(P⇒Q)).

The sole rewriting axiom of λ-calculus, lest we forget, is the β-rule:

(λx.F)G → F[x ← G]

That is, to apply function (λx.F) to argument G, substitute G for all free occurrences of x in F.

To prove inconsistency, first we need a simple result that comes entirely from λ-calculus itself, called the Fixpoint Theorem. This result says that for every term F, there exists a term G such that FG = G (that is, every term F has a fixpoint). The proof works like this:

Suppose F is any term, and let G = (λx.(F(xx)))(λx.(F(xx))), where x doesn't occur in F. Then G = (λx.(F(xx)))(λx.(F(xx))) → (F(xx))[x ← (λx.(F(xx)))] = F((λx.(F(xx)))(λx.(F(xx)))) = FG.

Notice that although the Fixpoint Theorem apparently says that every F has a fixpoint G, it does not actually require F to be a function at all: instead of providing a G to which F can be applied, it provides a G from which FG can be derived. And —moreover— for most F, derivation from G is a divergent computation (G → FG → F(FG) → F(F(FG)) → ...).

Now we're ready for our proof of inconsistency: that for every term P, provably P.

Suppose P is any term. Let Q = λx.(x⇒P). By the Fixpoint Theorem, let R be a term such that QR = R. By writing out the definition of Q and then applying the β-rule, we have QR = (λx.(x⇒P))R → (R⇒P), therefore R = (R⇒P).

By the second axiom, provably (R⇒R); but R = R⇒P, so, by replacing the right hand R in (R⇒R) with (R⇒P), provably (R⇒(R⇒P)).

By the third axiom, provably ((R⇒(R⇒P))⇒(R⇒P)); and we already have provably (R⇒(R⇒P)), so, by modus ponens, provably (R⇒P). But R = (R⇒P), so provably R.

Since provably R and provably (R⇒P), by modus ponens, provably P.

Note: I've had to fix errors in this proof twice since publication; there's some sort of lesson there about either formal proofs or paradoxes.

So, why did I go through all this in detail? Besides, of course, enjoying a good paradox. Well, mostly, this: The entire derivation turns on the essential premise that derivation in the calculus, as occurs (oddly backwards) in the proof of the Fixpoint Theorem, is a relation between logical terms — which is to say that all terms in the calculus have logical meaning.

And we've seen something like that before, in my early explanation of Mitchell Wand's no-go theorem: trivialization of theory resulted from assuming that all calculus derivation was evaluation. So, if we got around Wand's no-go theorem by recognizing that some derivation is not evaluation, what can we do by recognizing that some derivation is not deduction?

Posted by John Shutt at 12:42 PM 22 comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Godel's Theorem, Kernel, lambda calculus, logic, no-go theorems, programming languages