Structural insight: vau calculus

Showing posts with label vau calculus. Show all posts

Friday, June 16, 2017

Co-hygiene and quantum gravity

[l'Universo] è scritto in lingua matematica
([The Universe] is written in the language of mathematics)
— Galileo Galilei, Il Saggiatore (The Assayer), 1623.

Here's another installment in my ongoing exploration of exotic ways to structure a theory of basic physics. In our last exciting episode, I backtraced a baffling structural similarity between term-rewriting calculi and basic physics to a term-rewriting property I dubbed co-hygiene. This time, I'll consider what this particular vein of theory would imply about the big-picture structure of a theory of physics. For starters, I'll suggest it would imply, if fruitful, that quantum gravity is likely to be ultimately unfruitful and, moreover, quantum mechanics ought to be less foundational than it has been taken to be. The post continues on from there much further than, candidly, I had expected it to; by the end of this installment my immediate focus will be distinctly shifting toward relativity.

To be perfectly clear: I am not suggesting anyone should stop pursuing quantum gravity, nor anything else for that matter. I want to expand the range of theories explored, not contract it. I broadly diagnose basic physics as having fallen into a fundamental rut of thinking, that is, assuming something deeply structural about the subject that ought not to be assumed; and since my indirect evidence for this diagnosis doesn't tell me what that deep structural assumption is, I want to devise a range of mind-bendingly different ways to structure theories of physics, to reduce the likelihood that any structural choice would be made through mere failure to imagine an alternative.

The structural similarity I've been pursuing analogizes between, on one side, the contrast of pure function-application with side-effect-ful operations in term-rewriting calculi; and on the other side, the contrast of gravity with the other fundamental forces in physics. Gravity corresponds to pure function-application, and the other fundamental forces correspond to side-effects. In the earlier co-hygiene post I considered what this analogy might imply about nondeterminism in physics, and I'd thought my next post in the series would be about whether or not it's even mathematically possible to derive the quantum variety of nondeterminism from the sort of physical structure indicated. Just lately, though, I've realized there may be more to draw from the analogy by considering first what it implies about non-locality, folding in nondeterminism later. Starting with the observation that if quantum non-locality ("spooky action at a distance") is part of the analog to side-effects, then gravity should be outside the entanglement framework, implying both that quantum gravity would be a non-starter, and that quantum mechanics, which is routinely interpreted to act directly from the foundation of reality by shaping the spectrum of alternative versions of the entire universe, would have to be happening at a less fundamental level than the one where gravity differs from the other forces.

On my way to new material here, I'll start with material mostly revisited from the earlier post, where it was mixed in with a great deal of other material; here it will be more concentrated, with a different emphasis and perhaps some extra elements leading to additional inferences. As for the earlier material that isn't revisited here — I'm very glad it's there. This is, deliberately, paradigm-bending stuff, where different parts don't belong to the same conceptual framework and can't easily be held in the mind all at once; so if I hadn't written down all that intermediate thinking at the time, with its nuances and tangents, I don't think I could recapture it all later. I'll continue here my policy of capturing the journey, with its intermediate thoughts and their nuances and tangents.

Until I started describing λ-calculus here in earnest, it hadn't registered on me that it would be a major section of the post. Turns out, though, my perception of λ-calculus has been profoundly transformed by the infusion of perspective from physics; so I found myself going back to revisit basic principles that I would have skipped lightly over twenty years ago, and perhaps even two years ago. It remains to be seen whether developments later in this post will sufficiently alter my perspective to provoke yet another recasting of λ-calculus in some future post.

Contents
Side-effects
Variables
Side-effect-ful variables
Quantum scope
Geometry and network
Cosmic structure

Side-effects

There were three main notions of computability in the 1930s that were proved equi-powerful by the Church-Turing thesis: general recursive functions, λ-calculus, and Turing machines (due respectively to Jacques Herbrand and Kurt Gödel, to Alonzo Church, and to Alan Turing). General recursive functions are broadly equational in style, λ-calculus is stylistically more applicative; both are purely functional. Turing machines, on the other hand, are explicitly imperative. Gödel apparently lacked confidence in the purely functional approaches as notions of mechanical calculability, though Church was more confident, until the purely functional approaches were proven equivalent to Turing machines; which to me makes sense as a matter of concreteness. (There's some discussion of the history in a paper by Solomon Feferman; pdf.)

This mismatch between abstract elegance and concrete straightforwardness was an early obstacle, in the 1960s, to applying λ-calculus to programming-language semantics. Gordon Plotkin found a schematic solution strategy for the mismatch in his 1975 paper "Call-by-name, call-by-value and the λ-calculus" (pdf); one sets up two formal systems, one a calculus with abstract elegance akin to λ-calculus, the other an operational semantics with concrete clarity akin to Turing machines, then proves well-behavedness theorems for the calculus and correspondence theorems between the calculus and operational semantics. The well-behavedness of the calculus allows us to reason conveniently about program behavior, while the concreteness of the operational semantics allows us to be certain we are really reasoning about what we intend to. For the whole arrangement to work, we need to find a calculus that is fully well-behaved while matching the behavior of the operational semantics we want so that the correspondence theorems can be established.

Plotkin's 1975 paper modified λ-calculus to match the behavior of eager argument evaluation; he devised a call-by-value λ_v-calculus, with all the requisite theorems. The behavior was, however, still purely functional, i.e., without side-effects. Traditional mathematics doesn't incorporate side-effects. There was (if you think about it) no need for traditional mathematics to explicitly incorporate side-effects, because the practice of traditional mathematics was already awash in side-effects. Mutable state: mathematicians wrote down what they were doing; and they changed their own mental state and each others'. Non-local control-flow (aka "goto"s): mathematicians made intuitive leaps, and the measure of proof was understandability by other sapient mathematicians rather than conformance to some purely hierarchical ordering. The formulae themselves didn't contain side-effects because they didn't have to. Computer programs, though, have to explicitly encompass all these contextual factors that the mathematician implicitly provided to traditional mathematics. Programs are usually side-effect-ful.

In the 1980s Matthias Felleisen devised λ-like calculi to capture side-effect-ful behavior. At the time, though, he didn't quite manage the entire suite of theorems that Plotkin's paradigm had called for. Somewhere, something had to be compromised. In the first published form of Felleisen's calculi, he slightly weakened the well-behavedness theorems for the calculus. In another published variant he achieved full elegance for the calculus but slightly weakened the correspondence theorems between the calculus and the operational semantics. In yet another published variant he slightly modified the behavior — in operational semantics as well as calculus — to something he was able to reconcile without compromising the strength of the various theorems. This, then, is where I came into the picture: given Felleisen's solution and a fresh perspective (each generation knows a little less about what can't be done than the generation before), I thought I saw a way to capture the unmodified side-effect-ful behavior without weakening any of the theorems. Eventually I seized an opportunity to explore the insight, when I was writing my dissertation on a nearby topic. To explain where my approach fits in, I need to go back and pick up another thread: the treatment of variables in λ-calculus.

Variables

Alonzo Church also apparently seized an opportunity to explore an insight when doing research on a nearby topic. The main line of his research was to see if one could banish the paradoxes of classical logic by developing a formal logic that weakens reductio ad absurdum — instead of eliminating the law of the excluded middle, which was a favored approach to the problem. But when he published the logic, in 1932, he mentioned reductio ad absurdum in the first paragraph and then spent the next several paragraphs ranting about the evils of unbound variables. One gathers he wanted everything to be perfectly clear, and unbound variables offended his sense of philosophical precision. His logic had just one possible semantics for a variable, namely, a parameter to be supplied to a function; he avoided the need for any alternative notions of universally or existentially quantified variables, by the (imho quite lovely) device of using higher-order functions for quantification. That is (since I've brought it up), existential quantifier Σ applied to function F would produce a proposition ΣF meaning that there is some true proposition FX, and universal quantifier Π applied to F, proposition ΠF meaning that every proposition FX is true. In essence, he showed that these quantifiers are orthogonal to variable-binding; leaving him with only a single variable-binding device, which, for some reason lost to history, he called "λ".

λ-calculus is formally a term-rewriting calculus; a set of terms together with a set of rules for rewriting a term to produce another term. The two basic well-behavedness properties that a term-rewriting calculus generally ought to have are compatibility and Church-Rosser-ness. Compatibility says that if a term can be rewritten when it's a standalone term, it can also be rewritten when it's a subterm of a larger term. Church-Rosser-ness says that if a term can be rewritten in two different ways, then the difference between the two results can always be eliminated by some further rewriting. Church-Rosser-ness is another way of saying that rewriting can be thought of as a directed process toward an answer, which is characteristic of calculi. Philosophically, one might be tempted to ask why the various paths of rewriting ought to reconverge later, but this follows from thinking of the terms as the underlying reality. If the terms merely describe the reality, and the rewriting lets us reason about its development, then the term syntax is just a way for us to separately describe different parts of the reality, and compatibility and Church-Rosser-ness are just statements about our ability (via this system) to reason separately about different aspects of the development at different parts of the reality without distorting our eventual conclusion about where the whole development is going. From that perspective, Church-Rosser-ness is about separability, and convergence is just the form in which the separability appears in the calculus.

The syntax of λ-calculus — which particularly clearly illustrates these principles — is

T ::= x | (TT) | (λx.T) .

That is, a term is either a variable; or a combination, specifying that a function is applied to an operand; or a λ-expression, defining a function of one parameter. The T in (λx.T) is the body of the function, x its parameter, and free occurrences of x in T are bound by this λ. An occurrence of x in T is free if it doesn't occur inside a context (λx.[ ]) within T. This connection between a λ and the variable instances it binds is structural. Here, for example, is a term involving variables x, y, and z, annotated with pointers to a particular binding λ and its variable instances:

((λx.((λy.((λx.(xz))(xy)))(xz)))(xy)) .
^^ ^ ^

The x instance in the trailing (xy) is not bound by this λ since it is outside the binding expression. The x instance in the innermost (xz) is not bound since it is captured by another λ inside the body of the one we're considering. I suggest that the three marked elements — binder and two bound instances — should be thought of together as the syntactic representation of a deeper, distributed entity that connects distant elements of the term.

There is just one rewriting rule — one of the fascinations of this calculus, that just one rule suffices for all computation — called the β-rule:

((λx.T₁)T₂) → T₁[x ← T₂] .

The left-hand side of this rule is the redex pattern (redex short for reducible expression); it specifies a local pattern in the syntax tree of the term. Here the redex pattern is that some particular parent node in the syntax tree is a combination whose left-hand child is a λ-expression. Remember, this rewriting relation is compatible, so the parent node doesn't have to be the root of the entire tree. It's important that this local pattern in the syntax tree includes a variable binder λ, thus engaging not only a local region of the syntax tree, but also a specific distributed structure in the network of non-local connections across the tree. Following my earlier post, I'll call the syntax tree the "geometry" of the term, and the totality of the non-local connections its "network topology".

The right-hand side of the rule specifies replacement by substituting the operand T₂ for the parameter x everywhere it occurs free in the body T₁; but there's a catch. One might, naively, imagine that this would be recursively defined as

x[x ← T] = T
x₁[x₂ ← T] = x₁ if x₁ isn't x₂

(T₁ T₂)[x ← T] = (T₁[x ← T] T₂[x ← T])

(λx.T₁)[x ← T₂] = (λx.T₁)
(λx₁.T₁)[x₂ ← T₂] = (λx₁.T₁[x₂ ← T₂]) if x₁ isn't x₂.

This definition just descends the syntax tree substituting for the variable, and stops if it hits a λ that binds the same variable; very straightforward, and only a little tedious. Except that it doesn't work. Most of it does; but there's a subtle error in the rule for descending through a λ that binds a different variable,

(λx₁.T₁)[x₂ ← T₂] = (λx₁.T₁[x₂ ← T₂]) if x₁ isn't x₂.

The trouble is, what if T₁ contains a free occurrence of x₂ and, at the same time, T₂ contains a free instance of x₁? Then, before the substitution, that free instance of x₁ was part of some larger distributed structure; that is, it was bound by some λ further up in the syntax tree; but after the substitution, following this naive definition of substitution, a copy of T₂ is embedded within T₁ with an instance of x₁ that has been cut off from the larger distributed structure and instead bound by the inner λx₁, essentially altering the sense of syntactic template T₂. The inner λx₁ is then said to capture the free x₁ in T₂, and the resulting loss of integrity of the meaning of T₂ is called bad hygiene (or, a hygiene violation). For example,

((λy.(λx.y))x) ⇒_β (λx.y)[y ← x]

but under the naive definition of substitution, this would be (λx.x), because of the coincidence that the x we're substituting for y happens to have the same name as the bound variable of this inner λ. If the inner variable had been named anything else (other than y) there would have been no problem. The "right" answer here is a term of the form (λz.x), where any variable name could be used instead of z as long as it isn't "x" or "y". The standard solution is to introduce a rule for renaming bound variables (called α-renaming), and restrict the substitution rule to require that hygiene be arranged beforehand. That is,

(λx₁.T) → (λx₂.T[x₁ ← x₂]) where x₂ doesn't occur free in T

(λx₁.T₁)[x₂ ← T₂] = (λx₁.T₁[x₂ ← T₂]) if x₁ isn't x₂ and doesn't occur free in T₂.

Here again, this may be puzzling if one thinks of the syntax as the underlying reality. If the distributed structures of the network topology are the reality, which the syntax merely describes, then α-renaming is merely an artifact of the means of description; indeed, the variable-names themselves are merely an artifact of the means of description.

Side-effect-ful variables

Suppose we want to capture classical side-effect-ful behavior, unmodified, without weakening any of the theorems of Plotkin's paradigm. Side-effects are by nature distributed across the term, and would therefore seem to belong naturally to its network topology. In Felleisen's basic calculus, retaining the classical behavior and requiring the full correspondence theorems, side-effect-ful operations create syntactic markers that then "bubble up" through the syntax tree till they reach the top of the term, from which the global consequence of the side-effect is enacted by a whole-term-rewriting rule — thus violating compatibility, since the culminating rule is by nature applied to the whole term rather than to a subterm. This strategy seems, in retrospect, to be somewhat limited by an (understandable) inclination to conform to the style of variable handling in λ-calculus, whose sole binding device is tied to function application at a specific location in the geometry. Alternatively (as I seized the opportunity to explore in my dissertation), one can avoid the non-compatible whole-term rules by making the syntactic marker, which bubbles up through the term, a variable-binder. These side-effect-ful bindings are no longer strongly tied to a particular location in the geometry; they float, potentially to the top of the term, or may linger further down in the tree if the side-effect happens to only affect a limited region of the geometry. But the full classical behavior (in the cases Felleisen addressed) is captured, and Plotkin's entire suite of theorems are supported.

The calculus in which I implemented this side-effect strategy (along with some other things, that were the actual point of the dissertation but don't apparently matter here) is called vau-calculus.

Recall that the β-rule of λ-calculus applies to a redex pattern at a specific location in the geometry, and requires a binder to occur there so that it can also tie in to a specific element of the network topology. The same is true of the side-effect-ful rules of the calculus I constructed: a redex pattern occurs at a specific location in the geometry with a local tie-in to the network topology. There may then be a substitutive operation on the right-hand side of the rule, which uses the associated element of the network topology to propagate side-effect-ful consequences back down the syntax tree to the entire encompassed subterm. There is a qualitative difference, though, between the traditional substitution of the β-rule and the substitutions of the side-effect-ful operations. A traditional substitution T₁[x ← T₂] may attach new T₂ subtrees at certain leaves of the T₁ syntax tree (free instances of x in T₁), but does not disturb any of the pre-existing tree structure of T₁. Consequently, the only effect of the β-rule on the pre-existing geometry is the rearrangement it does within the redex pattern. This is symmetric to the hygiene property, which assures (by active intervention if necessary, via α-renaming) that the only effect of the β-rule on the pre-existing network topology is what it does to the variable element whose binding is within the redex pattern. I've therefore called the geometry non-disturbance property co-hygiene. As long as β-substitution is the only variable substitution used, co-hygiene is an easily overlooked property of the β-rule since, unlike hygiene, it does not require any active intervention to maintain.

The substitutions used by the side-effect-ful rewriting operations go to the same α-renaming lengths as the β-rule to assure hygiene. However, the side-effect-ful substitutions are non-co-hygienic. This might, arguably, be used as a technical definition of side-effects, which cause distributed changes to the pre-existing geometry of the term.

Quantum scope

Because co-hygiene is about not perturbing pre-existing geometry, it seems reasonable that co-hygienic rewriting operations should be more in harmony with the geometry than non-co-hygienic rewriting operations. Thus, β-rewriting should be more in harmony with the geometry of the term than the side-effect-ful operations; which, subjectively, does appear to be the case. (The property that first drew my attention to all this was that α-renaming, which is geometrically neutral, is a special case of β-substitution, whereas the side-effect-ful substitutions are structurally disparate from α-renaming.)

And gravity is more in harmony with the geometry of spacetime than are the other fundamental forces; witness general relativity.

Hence my speculation, by analogy, that one might usefully structure a theory of basic physics such that gravity is co-hygienic while the other fundamental forces are non-co-hygienic.

One implication of this line of speculation (as I noted in the earlier post) would be fruitlessness of efforts to unify the other fundamental forces with gravity by integrating them into the geometry of spacetime. If the other forces are non-co-hygienic, their non-affinity with geometry is structural, and trying to treat them in a more gravity-like way would be like trying to treat side-effect-ful behavior as structurally akin to function-application in λ-calculus — which I have long reckoned was the structural miscue that prevented Felleisen's calculus from supporting the full set of well-behavedness theorems.

On further consideration, though, something more may be suggested; even as the other forces might not integrate into the geometry of spacetime, gravity might not integrate into the infrastructure of quantum mechanics. All this has to do with the network topology, a non-local infrastructure that exists even in pure λ-calculus, but which in the side-effect-ful vau-calculus achieves what one might be tempted to call "spooky action at a distance". Suppose that quantum entanglement is part of this non-co-hygienic aspect of the theory. (Perhaps quantum entanglement would be the whole of the non-co-hygienic aspect, or, as I discussed in the earlier post, perhaps there would be other, non-quantum non-locality with interesting consequences at cosmological scale; then again, one might wonder if quantum entanglement would itself have consequences at cosmological scale that we have failed to anticipate because the math is beyond us.) It would follow that gravity would not exhibit quantum entanglement. On one hand, this would imply that quantum gravity should not work well as a natural unification strategy. On the other hand, to make this approach work, something rather drastic must happen to the underpinnings of quantum mechanics, both philosophical and technical.

We understand quantum mechanics as describing the shape of a spectrum of different possible realities; from a technical perspective that is what quantum mechanics describes, even if one doesn't accept it as a philosophical interpretation (and many do accept that interpretation, if only on grounds of Occam's Razor that there's no reason to suppose philosophically some other foundation than is supported technically). But, shaped spectra of alternative versions of the entire universe seems reminiscent of whole-term rewriting in Felleisen's calculus — which was, notably, a consequence of a structural design choice in the calculus that actually weakened the internal symmetry of the system. The alternative strategy of vau-calculus both had a more uniform infrastructure and avoided the non-compatible whole-term rewriting rules. An analogous theory of basic physics ought to account for quantum entanglement without requiring wholesale branching of alternative universes. Put another way, if gravity isn't included in quantum entanglement, and therefore has to diverge from the other forces at a level more basic than the level where quantum entanglement arises, then the level at which quantum entanglement arises cannot be the most basic level.

Just because quantum structure would not be at the deepest level of physics, does not at all suggest that what lies beneath it must be remotely classical. Quantum mechanics is mathematically a sort of lens that distorts whatever classical system is passed through it; taking the Schrödinger equation as demonstrative,

iℏ ∂ Ψ

∂t
= Ĥ Ψ ,

the classical system is contained in the Hamiltonian function Ĥ, which is plugged into the equation to produce a suitable spectrum of alternatives. Hence my description of the quantum equation itself as basic. But, following the vau-calculus analogy, it seems some sort of internal non-locality ought to be basic, as it follows from the existence of the network topology; looking at vau-calculus, even the β-rule fully engages the network topology, though co-hygienically.

Geometry and network

The above insights on the physical theory itself are mostly negative, indicating what this sort of theory of physics would not be like, what characteristics of conventional quantum math it would not have. What sort of structure would it have?

I'm not looking for detailed math, just yet, but the overall shape into which the details would be cast. Some detailed math will be needed, before things go much further, to demonstrate that the proposed approach is capable of generating predictions sufficiently consistent with quantum mechanics, keeping in mind the well-known no-go result of Bell's Theorem. I'm aware of the need; the question, though, is not whether Bell's Theorem can be sidestepped — of course it can, like any other no-go theorem, by blatantly violating one of its premises — but whether it can be sidestepped by a certain kind of theory. So the structure of the theory is part of the possibility question, and needs to be settled before we can ask the question properly.

In fact, one of my concerns for this sort of theory is that it might have too many ways to get around Bell's Theorem. Occam's Razor would not look favorably on a theory with redundant Bell-avoidance devices.

Let's now set aside locality for a moment, and consider nondeterminism. Bell's Theorem calls (in combination with some experimental results that are, somewhat inevitably, argued over) for chronological nondeterminism, that is, nondeterminism relative to the time evolution of the physical system. One might, speculatively, be able to approximate that sort of nondeterminism arbitrarily well, in a fundamentally non-local theory, by exploiting the assumption that the physical system under consideration is trivially small relative to the whole cosmos. We might be able to draw on interactions with distant elements of the cosmos to provide a more-or-less "endless" supply of pseudo-randomness. I considered this possibility in the earlier post on co-hygiene, and it is an interesting theoretical question whether (or, at the very least, how) a theory of this sort could in fact generate the sort of quantum probability distribution that, according to Bell's Theorem, cannot be generated by a chronologically deterministic local theory. The sort of theory I'm describing, however, is merely a way to provide a local illusion of nondeterminism in a non-local theory with global determinism — and when we're talking chronology, it is difficult even to define global determinism (because, thanks to relativity, "time" is tricky to define even locally; made even trickier since we're now contemplating a theory lacking the sort of continuity that relativity relies upon; and is likely impossible to define globally, thanks to relativity's deep locality). It's also no longer clear anymore why one should expect chronological determinism at all.

A more straightforward solution, seemingly therefore favored by Occam's Razor, is to give up on chronological determinism and instead acquire mathematical determinism, by the arguably "obvious" strategy of supposing that the whole of spacetime evolves deterministically along an orthogonal dimension, converting unknown initial conditions (initial in the orthogonal dimension) into chronological nondeterminism. I demonstrated the principle of this approach in an earlier post. It is a bit over-powered, though; a mathematically deterministic theory of this sort — moreover, a mathematically deterministic and mathematically local theory of this sort — can readily generate not only a quantum probability distribution of the sort considered by Bell's Theorem, but, on the face of it, any probability distribution you like. This sort of excessive power would seem rather disfavored by Occam's Razor.

The approach does, however, seem well-suited to a co-hygiene-directed theory. Church-Rosser-ness implies that term rewriting should be treated as reasoning rather than directly as chronological evolution, which seemingly puts term rewriting on a dimension orthogonal to spacetime. The earlier co-hygiene post noted that calculi, which converge to an answer via Church-Rosser-ness, contrast with grammars, which are also term-rewriting systems but exist for the purpose of diverging and are thus naturally allied with mathematical nondeterminism whereas calculi naturally ally with mathematical determinism. So our desire to exploit the calculus/physics analogy, together with our desire for abstract separability of parts, seems to favor this use of a rewriting dimension orthogonal to spacetime.

A puzzle then arises about the notion of mathematical locality. When the rewriting relation, through this orthogonal dimension (which I used to call "meta-time", though now that we're associating it with reasoning some other name is wanted), changes spacetime, there's no need for the change to be non-local. We can apparently generate any sort of physical laws, quantum or otherwise, without the need for more than strictly local rewrite rules; so, again by Occam's Razor, why would we need to suppose a whole elaborate non-local "network topology"? A strictly local rewriting rule sounds much simpler.

Consider, though, what we mean by locality. Both nondeterminism and locality must be understood relative to a dimension of change, thus "chronological nondeterminism"; but to be thorough in defining locality we also need a notion of what it means for two elements of a system state to be near each other. "Yes, yes," you may say, "but we have an obvious notion of nearness, provided by the geometry of spacetime." Perhaps; but then again, we're now deep enough in the infrastructure that we might expect the geometry of spacetime to emerge from something deeper. So, what is the essence of the geometry/network distinction in vau-calculus?

A λ-calculus term is a syntax tree — a graph, made up of nodes connected to each other by edges that, in this case, define the potential function-application relationships. That is, the whole purpose of the context-free syntax is to define where the interactions — the redex patterns for applying the β-rule — are. One might plausibly say much the same for the geometry of spacetime re gravity, i.e., location in spacetime defines the potential gravitational interactions. The spacetime geometry is not, evidently, hierarchical like that of λ-calculus terms; that hierarchy is apparently a part of the function-application concept. Without the hierarchy, there is no obvious opportunity for a direct physical analog to the property of compatibility in term-rewriting calculi.

The network topology, i.e., the variables, provide another set of connections between nodes of the graph. These groups of connection are less uniform, and the variations between them do not participate in the redex patterns, but are merely tangential to the redex patterns thus cuing the engagement of a variable structure in a rewriting transformation. In vau-calculi the variable is always engaged in the redex through its binding, but this is done for compatibility; by guaranteeing that all the variable instances occur below the binding in the syntax tree, the rewriting transformation can be limited to that branch of the tree. Indeed, only the λ bindings really have a fixed place in the geometry, dictated by the role of the variable in the syntactically located function application; side-effect-ful bindings float rather freely, and their movement through the tree really makes no difference to the function-application structure as long as they stay far enough up in the tree to encompass all their matching variable instances. If not for the convenience of tying these bindings onto the tree, one might represent them as partly or entirely separate from the tree (depending on which kind of side-effect one is considering), tethered to the tree mostly by the connections to the bound variable instances. The redex pattern, embedded within the geometry, would presumably be at a variable instance. Arranging for Church-Rosser-ness would, one supposes, be rather more challenging without compatibility.

Interestingly, btw, of the two classes of side-effects considered by vau-calculus (and by Felleisen), this separation of bindings from the syntax tree is more complete for sequential-state side-effects than for sequential-control side-effects — and sequential control is much more simply handled in vau-calculus than is sequential state. I'm still wondering if there's some abstract principle here that could relate to the differences between various non-gravitational forces in physics, such as the simplicity of Maxwell's equations for electromagnetism.

This notion of a binding node for a variable hovering outside the geometry, tethered more-or-less-loosely to it by connections to variable instances, has a certain vague similarity to the aggressive non-locality of quantum wave functions. The form of the wave function would, perhaps, be determined by a mix of the nature of the connections to the geometry together with some sort of blurring effect resulting from a poor choice of representing structures; the hope would be that a better choice of representation would afford a more focused description.

I've now identified, for vau-calculus, three structural differences between the geometry and the network.

The geometry contains the redex patterns (with perhaps some exotic exceptions).
The geometric topology is much simpler and more uniform than the network topology.
The network topology is treated hygienically by all rewriting transformations, whereas the geometry is treated co-hygienically only by one class of rewriting transformations (β).

But which of these three do we expect to carry over to physics?

The three major classes of rewriting operations in vau-calculus — function application, sequential control, and sequential state — all involve some information in the term that directs the rewrite and therefore belongs in the redex pattern. All three classes of operations involve distributing information to all the instances of the engaged variable. But, the three classes differ in how closely this directing information is tied to the geometry.

For function application, the directing information is entirely contained in the geometry, the redex pattern of the β-rule, ((λx.T₁)T₂). The only information about the variable not contained within that purely geometric redex pattern is the locations of the bound instances.

For sequential control, the variable binder is a catch expression, and the bound variable instances are throw expressions that send a value up to the matching catch. (I examined this case in detail in an earlier post.) The directing information contained in the variable, beyond the locations of the bound instances, would seem to be the location of the catch; but in fact the catch can move, floating upward in the syntax tree, though moving the catch involves a non-co-hygienic substitutive transformation — in fact, the only non-co-hygienic transformation for sequential control. So the directing information is still partly tied to the syntactic structure (and this tie is somehow related to the non-co-hygiene). The catch-throw device is explicitly hierarchical, which would not carry over directly to physics; but this may be only a consequence of its relation to the function-application structure, which does carry over (in the broad sense of spacetime geometry). There may yet be more to make of a side analogy between vau-calculus catch-throw and Maxwell's Equations.

For sequential state, the directing information is a full-blown environment, a mapping from symbols to values, with arbitrarily extensive information content and very little relation to geometric location. The calculus rewrite makes limited use of the syntactic hierarchy to coordinate time ordering of assignments — not so much inherently hierarchical as inherently tied to the time sequencing of function applications, which itself happens to be hierarchical — but this geometric connection is even weaker than for catch-throw, and its linkage to time ordering is more apparent. In correspondence with the weaker geometric ties, the supporting rewrite rules are much more complicated, as they moderate passage of information into and out of the mapping repository.

"Time ordering" here really does refer to time in broadly the same sense that it would arise in physics, not to rewriting order as such. That is, it is the chronological ordering of events in the programming language described by the rewriting system, analogous to the chronological ordering of events described by a theory of physics. Order of rewriting is in part related to described chronology, although details of the relationship would likely be quite different for physics where it's to do with relativity. This distinction is confusing even in term-rewriting PL semantics, where PL time is strictly classical; one might argue that confusion between rewriting, which is essentially reasoning, and evaluation, which is the PL process reasoned about, resulted in the unfortunately misleading "theory of fexprs is trivial" result which I have discussed here previously.

It's an interesting insight that, while part of the use of syntactic hierarchy in sequential control/state — and even in function application, really — is about compatibility, which afaics does not at all carry over to physics, their remaining use of syntactic hierarchy is really about coordination of time sequencing, which does occur in physics in the form of relativity. Admittedly, in this sort of speculative exploration of possible theories for physics, I find the prospect of tinkering with the infrastructure of quantum mechanics not nearly as daunting as tinkering with the infrastructure of relativity.

At any rate, the fact that vau-calculus puts the redex pattern (almost always) entirely within a localized area of the syntax, would seem to be more a statement about the way the information is represented than about the geometry/network balance. That is, vau-calculus represents the entire state of the system by a syntactic term, so each item of information has to be given a specific location in the term, even if that location is chosen somewhat arbitrarily. It is then convenient, for time ordering, to require that all the information needed for a transformation should get together in a particular area of the term. Quantum mechanics may suffer from a similar problem, in a more advanced form, as some of the information in a wave function may be less tied to the geometry than the equations (e.g. the Schrödinger equation) depict it. What really makes things messy is devices that are related to the geometry but less tightly so than the primary, co-hygienic device. Perhaps that is the ultimate trade-off, with differently structured devices becoming more loosely coupled to the geometry and proportionately less co-hygienic.

All of which has followed from considering the first of three geometry/network asymmetries: that redex patterns are mostly contained in the geometry rather than the network. The other two asymmetries noted were (1) that the geometric structure is simple and uniform while the network structure is not, and (2) that the network is protected from perturbation while the geometry is not — i.e., the operations are all hygienic (protecting the network) but not all are co-hygienic (protecting the geometry). Non-co-hygiene complicates things only moderately, because the perturbations are to the simple, uniform part of the system configuration; all of the operations are hygienic, so they don't perturb the complicated, nonuniform part of the configuration. Which is fortunate for mathematical treatment; if the perturbations were to the messy stuff, it seems we mightn't be able to cope mathematically at all. So these two asymmetries go together. In my more cynical moments, this seems like wishful thinking; why should the physical world be so cooperative? However, perhaps they should be properly understood as two aspects of a single effect, itself a kind of separability, the same view I've recommended for Church-Rosser-ness; in fact, Church-Rosser-ness may be another aspect of the same whole. The essential point is that we are able to usefully consider individual parts of the cosmos even though they're all interconnected, because there are limits on how aggressively the interconnectedness is exercised. The "geometry" is the simple, uniform way of decomposing the whole into parts, and "hygiene" is an assertion that this decomposition suffices to keep things tractable. It's still fair to question why the cosmos should be separable in this way, and even to try to build a theory of physics in which the separation breaks down; but there may be some reassurance, re Occam's Razor, in the thought that these two asymmetries (simplicity/uniformity, and hygiene) are two aspects of a single serendipitous effect, rather than two independently serendipitous effects.

Cosmic structure

Most of these threads are pointing toward a rewriting relation along a dimension orthogonal to spacetime, though we're lacking a good name for it atm (I tend to want to name things early in the development process, though I'm open to change if a better name comes along).

One thread, mentioned above, that seems at least partly indifferent to the rewriting question, is that of changes in the character of quantum mechanics at cosmological scale. This relates to the notion of decoherence. It was recognized early in the conceptualization of quantum mechanics that a very small entangled quantum system would tend to interact with the rest of the universe and thereby lose its entanglement and, ultimately, become more classical. We can only handle the quantum math for very small physical systems; in fact, rather insanely small physical systems. Intuitively, what if this tendency of entanglement to evaporate when interacting with the rest of the universe ceases to be valid when the size of the physical system is sufficiently nontrivial compared to the size of the whole universe? In the traditional quantum mechanics, decoherence appears to be an all-or-nothing proposition, a strict dichotomy tied to the concept of observation. If something else is going on at large scales, either it is an unanticipated implication of the math-that-we-can't-do, or it is an aspect of the physics that our quantum math doesn't include because the phenomena that would cause us to confront this aspect are many orders of magnitude outside anything we could possibly apply the quantum math to. It's tantalizing that this conjures both the problem of observation, and the possibility that quantum mechanics may be (like Newtonian mechanics) only an approximation that's very good within its realm of application.

The persistently awkward interplay of the continuous and discrete is a theme I've visited before. Relativity appears to have too stiff a dose of continuity in it, creating a self-reference problem even in the non-quantum case (iirc Einstein had doubts on this point before convincing himself the math of general relativity could be made to work); and when non-local effects are introduced for the quantum case, continuity becomes overconstraining. Quantum gravity efforts suffer from a self-reference problem on steroids (non-renormalizable infinities). The Big Picture perspective here is that non-locality and discontinuity go together because a continuum — as simple and uniform as it is possible to be — is always going to be perceived as geometry.

The non-local network in vau-calculus appears to be inherently discrete, based on completely arbitrary point-to-point connections defined by location of variable instances, with no obvious way to set up any remotely similar continuous arrangement. Moreover, the means I've described for deriving nondeterminism from the network connections (on which I went into some detail in the earlier post) exploits the potential for chaotic scrambling of discrete point-to-point connections by following successions of links hopscotching from point to point. While the geometry might seem more amenable to continuity, a truly continuous geometry doesn't seem consistent with point-to-point network connections, either, as one would then have the prospect of an infinitely dense tangle of network connections to randomly unrelated remote points, a sort of probability-density field that seems likely to wash out the randomness advantages of the strategy and less likely to be mathematically useful; so the whole rewriting strategy appears discrete in both the geometry and network aspects of its configuration as well as in the discrete rewriting steps themselves.

The rewriting approach may suffer from too stiff a dose of discreteness, as it seems to force a concrete choice of basic structures. Quantum mechanics is foundationally flexible on the choice of elementary particles; the mathematical infrastructure (e.g. the Schrödinger equation) makes no commitment on the matter at all, leaving it to the Hamiltonian Ĥ. Particles are devised comparatively freely, as with such entities as phonons and holes. Possibly the rewriting structure one chooses will afford comparable flexibility, but it's not at all obvious that one could expect this level of versatile refactoring from a thoroughly discrete system. Keeping in mind this likely shortfall of flexibility, it's not immediately clear what the basic elements should be. Even if one adopts, say, the standard model, it's unclear how that choice of observable particles would correspond to concrete elements in a discrete spacetime-rewriting system (in one "metaclassical" scenario I've considered, spacetime events are particle-like entities tracing out one-dimensional curves as spacetime evolves across an orthogonal dimension); and it is by no means certain that the observable elements ought to follow the standard model, either. As I write this there is, part of the time, a cat sitting on the sofa next to me. It's perfectly clear to me that this is the correct way to view the situation, even though on even moderately closer examination the boundaries of the cat may be ambiguous, e.g. at what point an individual strand of fur ceases to be part of the cat. By the time we get down to the scale where quantum mechanics comes into play and refactoring of particles becomes feasible, though, is it even certain that those particles are "really" there? (Hilaire Belloc cast aspersions on the reality of a microbe merely because it couldn't be seen without the technological intervention of a microscope ; how much more skepticism is recommended when we need a gigantic particle accelerator?)

Re the structural implications of quasiparticles (such as holes), note that such entities are approximations introduced to describe the behavior of vastly complicated systems underneath. A speculation that naturally springs to mind is, could the underlying "elementary" particles be themselves approximations resulting from complicated systems at a vastly smaller scale; which would seem problematic in conventional physics since quantum mechanics is apparently inclined to stop at Planck scale. However, the variety of non-locality I've been exploring in this thread may offer a solution: by maintaining network connections from an individual "elementary" particle to remote, and rather arbitrarily scrambled, elements of the cosmos, one could effectively make the entire cosmos (or at least significant parts of it) serve as the vastly complicated system underlying the particle.

It is, btw, also not certain what we should expect as the destination of a spacetime-rewriting relation. An obvious choice, sufficient for a proof-of-concept theory (previous post), is to require that spacetime reach a stable state, from which there is either no rewriting possible, or further rewriting leaves the system state unchanged. Is that the only way to derive a final state of spacetime? No. Whatever other options might be devised, one that comes to mind is some form of cycle, repeating a closed set of states of spacetime, perhaps giving rise to a set of states that would manifest in more conventional quantum math as a standing wave. Speculatively, different particles might differ from each other by the sort of cyclic pattern they settle into, determining a finite — or perhaps infinite — set of possible "elementary particles". (Side speculation: How do we choose an initial state for spacetime? Perhaps quantum probability distributions are themselves stable in the sense that, while most initial probability distributions produce a different final distribution, a quantum distribution produces itself.)

Granting that the calculus/physics analogy naturally suggests some sort of physical theory based on a discrete rewriting system, I've had recurring doubts over whether the rewriting ought to be in the direction of time — an intuitively natural option — or, as discussed, in a direction orthogonal to spacetime. At this point, though, we've accumulated several reasons to prefer rewriting orthogonal to spacetime.

Church-Rosser-ness. CR-ness is about ability to reason separately about the implications of different parts of the system, without having to worry about which reasoning to do first. The formal property is that whatever order one takes these locally-driven inferences in ("locally-driven" being a sort of weak locality), it's always possible to make later inferences that reach a common conclusion by either path. This makes it implausible to think of these inference steps as if they were chronological evolution.

Bell's Theorem. The theorem says, essentially, the probability distributions of quantum mechanics can't be generated by a conventionally deterministic local theory. Could it be done by a non-local rewriting theory evolving deterministically forward in time? My guess would be, probably it could (at least for classical time); but I suspect it'd be rather artificial, whereas my sense of the orthogonal-dimension rewriting approach (from my aforementioned proof-of-concept) is that it ought to work out neatly.

Relativity. Uses an intensively continuous mathematical infrastructure to construct a relative notion of time. It would be rather awkward to set an intensively discrete rewriting relation on top of this relative notion of time; the intensively discrete rewriting really wants to be at a deeper level of reality than any continuous relativistic infrastructure, rather than built on top of it (just as we've placed it at a deeper level than quantum entanglement), with apparent continuity arising from statistical averaging over the discrete foundations. Once rewriting is below relativity, there is no clear definition of a "chronological" direction for rewriting; so rewriting orthogonal to spacetime is a natural device from which to derive relativistic structure. Relativity is however a quintessentially local theory, which ought to be naturally favored by a predominately local rewriting relation in the orthogonal dimension. Deriving relativistic structure from an orthogonal rewriting relation with a simple causal structure also defuses the self-reference problems that have lingered about gravity.

It's rather heartening to see this feature of the theory (rewriting orthogonal to spacetime) — or really any feature of a theory — drawing support from considerations in both quantum mechanics and relativity.

The next phase of exploring this branch of theory — working from these clues to the sort of structure such a theory ought to have — seems likely to study how the shape of a spacetime-orthogonal rewriting system determines the shape of spacetime. My sense atm is that one would probably want particular attention to how the system might give rise to a relativity-like structure, with an eye toward what role, if any, a non-local network might play in the system. Keeping in mind that β-rule use of network topology, though co-hygienic, is at the core of what function application does and, at the same time, inspired my suggestion to simulate nondeterminism through repeatedly rescrambled network connections; and, likewise, keeping in mind evidence (variously touched on above) on the possible character of different kinds of generalized non-co-hygienic operations.

Posted by John Shutt at 8:17 PM 7 comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: lambda calculus, locality, mathematical physics, quantum mechanics, vau calculus

Saturday, June 11, 2016

The co-hygiene principle

The mathematics is not there till we put it there.
— Sir Arthur Eddington, The Philosophy of Physical Science, 1938.

Investigating possible connections between seemingly unrelated branches of science and mathematics can be very cool. Independent of whether the connections actually pan out. It can be mind-bending either way — I'm a big fan of mind-bending, as a practical cure for rigid thinking — and you can get all sorts of off-beat insights into odd corners that get illuminated along the way. The more unlikely the connection, the more likely potential for mind bending; and also the more likely potential for pay-off if somehow it does pan out after all.

Two hazards you need to avoid, with this sort of thing: don't overplay the chances it'll pan out — and don't underplay the chances it'll pan out. Overplay and you'll sound like a crackpot and, worse, you might turn yourself into one. Relish the mind bending, take advantage of it to keep your thinking limber, and don't get upset when you're not finding something that might not be there. And at the same time, if you're after something really unlikely, say with only one chance in a million it'll pan out, and you don't leave yourself open to the possibility it will, you might just draw that one chance in a million and miss it, which would be just awful. So treat the universe as if it has a sense of humor, and be prepared to roll with the punchlines.

Okay, the particular connection I'm chasing is an observed analogy between variable substitution in rewriting calculi and fundamental forces in physics. If you know enough about those two subjects to say that makes no sense, that's what I thought too when I first noticed the analogy. It kept bothering me, though, because it hooks into something on the physics side that's already notoriously anomalous — gravity. The general thought here is that when two seemingly disparate systems share some observed common property, there may be some sort of mathematical structure that can be used to describe both of them and gives rise to the observed common property; and a mathematical modeling structure that explains why gravity is so peculiar in physics is an interesting prospect. So I set out to understand the analogy better by testing its limits, elaborating it until it broke down. Except, the analogy has yet to cooperate by breaking down, even though I've now featured it on this blog twice (1, 2).

So, building on the earlier explorations, in this post I tackle the problem from the other end, and try to devise a type of descriptive mathematical model that would give rise to the pattern observed in the analogy.

This sort of pursuit, as I go about it, is a game of endurance; again and again I'll lay out all the puzzle pieces I've got, look at them together, and try to accumulate a few more insights to add to the collection. Then gather up the pieces and save them away for a while, and come back to the problem later when I'm fresh on it again. Only this time I've kind-of succeeded in reaching my immediate goal. The resulting post, laying out the pieces and accumulating insights, is therefore both an explanation of where the result comes from and a record of the process by which I got there. There are lots of speculations within it shooting off in directions that aren't where I ended up. I pointedly left the stray speculations in place. Some of those tangents might turn out to be valuable after all; and taking them out would create a deceptive appearance of things flowing inevitably to a conclusion when, in the event, I couldn't tell whether I was going anywhere specific until I knew I'd arrived.

Naturally, for finding a particular answer — here, a mathematical structure that can give rise to the observed pattern — the reward is more questions.

Contents
Noether's Theorem
Calculi
Analogy
Metatime
Transformations
Determinism and rewriting
Nondeterminism and the cosmic footprint
Massive interconnection
Factorization
Side-effects
Co-hygiene
Epilog: hygiene

Noether's Theorem

Noether's theorem (pedantically, Noether's first theorem) says that each differentiable invariant in the action of a system gives rise to a conservation law. This is a particularly celebrated result in mathematical physics; it's explicitly about how properties of a system are implied by the mathematical structure of its description; and invariants — the current fad name for them in physics is "symmetries" — are close kin to both hygiene and geometry, which relate to each other through the analogy I'm pursuing; so Noether's theorem has a powerful claim on my attention.

The action of a system always used to seem very mysterious to me, until I figured out it's one of those deep concepts that, despite its depth, is also quite shallow. It comes from Lagrangian mechanics, a mathematical formulation of classical mechanics alternative to the Newtonian mechanics formulation. This sort of thing is ubiquitous in mathematics, alternative formulations that are provably equivalent to each other but make various problems much easier or harder to solve.

Newtonian mechanics seeks to describe the trajectory of a thing in terms of its position, velocity, mass, and the forces acting on it. This approach has some intuitive advantages but is sometimes beastly difficult to solve for practical problems. The Lagrangian formulation is sometimes much easier to solve. Broadly, the time evolution of the system follows a trajectory through abstract state-space, and a function called the Lagrangian of the system maps each state into a quantity that... er... well, its units are those of energy. For each possible trajectory of the system through state-space, the path integral of the Lagrangian is the action. The principle of least action says that starting from a given state, the system will evolve along the trajectory that minimizes the action. Solving for the behavior of the system is then a matter of finding the trajectory whose action is smallest. (How do you solve for the trajectory with least action? Well, think of the trajectories as abstract values subject to variation, and imagine taking the "derivative" of the action over these variations. The least action will be a local minimum, where this derivative is zero. There's a whole mathematical technology for solving problems of just that form, called the "calculus of variations".)

The Lagrangian formulation tends to be good for systems with conserved quantities; one might prefer the Newtonian approach for, say, a block sliding on a surface with friction acting on it. And this Lagrangian affinity for conservative systems is where Noether's theorem comes in: if there's a differentiable symmetry of the action — no surprise it has to be differentiable, seeing how central integrals and derivatives are to all this — the symmetry manifests itself in the system behavior as a conservation law.

And what, you may ask, is this magical Lagrangian function, whose properties studied through the calculus of variations reveal the underlying conservation laws of nature? Some deeper layer of reality, the secret structure that underlies all? Not exactly. The Lagrangian function is whatever works: some function that causes the principle of least action to correctly predict the behavior of the system. In quantum field theory — so I've heard, having so far never actually grappled with QFT myself — the Lagrangian approach works for some fields but there is no Lagrangian for others. (Yes, Lagrangians are one of those mathematical devices from classical physics that treats systems in such an abstract, holistic way that it's applicable to quantum mechanics. As usual for such devices, its history involves Sir William Rowan Hamilton, who keeps turning up on this blog.)

This is an important point: the Lagrangian is whatever function makes the least-action principle work right. It's not "really there", except in exactly the sense that if you can devise a Lagrangian for a given system, you can then use it via the action integral and the calculus of variations to describe the behavior of the system. Once you have a Lagrangian function that does in fact produce the system behavior you want it to, you can learn things about that behavior from mathematical exploration of the Lagrangian. Such as Noether's theorem. When you find there is, or isn't, a certain differentiable symmetry in the action, that tells you something about what is or isn't conserved in the behavior of the system, and that result really may be of great interest; just don't lose sight of the fact that you started with the behavior of the system and constructed a suitable Lagrangian from which you are now deducing things about what the behavior does and doesn't conserve.

In 1543, Copernicus's heliocentric magnum opus De revolutionibus orbium coelestium was published with an unsigned preface by Lutheran theologian Andreas Osiander saying, more or less, that of course it'd be absurd to suggest the Earth actually goes around the Sun but it's a very handy fiction for the mathematics. Uhuh. It's unnecessary to ask whether our mathematical models are "true"; we don't need them to be true, just useful. When Francis Bacon remarked that what is most useful in practice is most correct in theory, he had a point — at least, for practical purposes.

Calculi

The rewriting-calculus side of the analogy has a structural backstory from at least the early 1960s (some of which I've described in an earlier post, though with a different emphasis). Christopher Strachey hired Peter Landin as an assistant, and encouraged him to do side work exploring formal foundations for programming languages. Landin focused on tying program semantics to λ-calculus; but this approach suffered from several mismatches between the behavioral properties of programming languages versus λ-calculus, and in 1975 Gordon Plotkin published a solution for one of these mismatches, in one of the all-time classic papers in computer science, "Call-by-name, call-by-value and the λ-calculus" (pdf). Plotkin defined a slight variant of λ-calculus, by altering the conditions for the β-rule so that the calculus became call-by-value (the way most programming languages behaved while ordinary λ-calculus did not), and proved that the resulting λ_v-calculus was fully Church-Rosser ("just as well-behaved" as ordinary λ-calculus). He further set up an operational semantics — a rewriting system that ignored mathematical well-behavedness in favor of obviously describing the correct behavior of the programming language — and proved a set of correspondence theorems between the operational semantics and λ_v-calculus.

[In the preceding paragraph I perhaps should have mentioned compatibility , the other crucial element of rewriting well-behavedness; which you might think I'd have thought to mention since it's a big deal in my own work, though less flashy and more taken-for-granted than Church-Rosser-ness.]

Then in the 1980s, Matthias Felleisen applied Plotkin's approach to some of the most notoriously "unmathematical" behaviors of programs: side-effects in both data (mutable variables) and control (goto and its ilk). Like Plotkin, he set up an operational semantics and a calculus, and proved correspondence theorems between them, and well-behavedness for the calculus. He introduced the major structural innovation of treating a side-effect as an explicit syntactic construct that could move upward within its term. This upward movement would be a fundamentally different kind of rewrite from the function-application — the β-rule — of λ-calculus; abstractly, a side-effect is represented by a context 𝓢, which moves upward past some particular context C and, in the process, modifies C to leave in its wake some other context C': C[𝓢[T]] → 𝓢[C'[T]] . A side-effect is thus viewed as something that starts in a subterm and expands outward to affect more and more of the term until, potentially, it affects the whole term — if it's allowed to expand that far. Of course, a side-effect might never expand that far if it's trapped inside a context that it can't escape from; notably, no side-effect can escape from context λ.[ ] , which is to say, no side-effect can escape from inside the body of a function that hasn't been called.

This is where I started tracking the game, and developing my own odd notions. There seemed to me to be two significant drawbacks to Felleisen's approach, in its original published form. For one thing, the transformation of context C to C', as 𝓢 moved across it, could be quite extensive; Felleisen himself aptly called these transformations "bubbling up"; as an illustration of how messy things could get, here are the rules for a continuation-capture construct C expanding out of the operator or operand of a function call:

(CT₁)T₂ → C(λx₁.(T₁(λx₂.(A(x₁(x₂T₂)))))) for unused x_k.
V(CT) → C(λx₁.(T(λx₂.(A(x₁(V‍x₂)))))) for unused x_k.

The other drawback to the approach was that as published at the time, it didn't actually provide the full measure of well-behavedness from Plotkin's treatment of call-by-value. One way or another, a constraint had to be relaxed somewhere. What does the side-effect construct 𝓢 do once it's finished moving upward? The earliest published solution was to wait till 𝓢 reaches the top of the term, and then get rid of it by a whole-term rewriting rule; that works, but the whole-term rewriting rule is explicitly not well-behaved: calculus well-behavedness requires that any rewriting on a whole term can also be done to a subterm, and here we've deliberately introduced a rewriting rule that can't be applied to subterms. So we've weakened the calculus well-behavedness. Another solution is to let 𝓢 reach the top of the term, then let it settle into some sort of normal form, and relax the semantics–calculus correspondence theorems to allow for equivalent normal forms. So the correspondence is weaker or, at least, more complicated. A third solution is to introduce an explicit context-marker — in both the calculus and the operational semantics — delimiting the possible extent of the side-effect. So you've got full well-behavedness but for a different language than you started out with. (Felleisen's exploration of this alternative is part of the prehistory of delimited continuations, but that's another story.)

[In a galling flub, I'd written in the preceding paragraph Church-Rosser-ness instead of well-behavedness; fixed now.]

It occurred to me that a single further innovation should be able to eliminate both of these drawbacks. If each side-effect were delimited by a context-marker that can move upward in the term, just as the side-effect itself can, then the delimiter would restore full Church-Rosser-ness without altering the language behavior; but, in general, the meanings of the delimited side-effects depend on the placement of the delimiter, so to preserve the meaning of the term, moving the delimiter may require some systematic alteration to the matching side-effect markers. To support this, let the delimiter be a variable-binding construct, with free occurrences of the variable in the side-effect markers. The act of moving the delimiter would then involve a sort of substitution function that propagates needed information to matching side-effect markers. What with one thing and another, my academic pursuits dragged me away from this line of thought for years, but then in the 2000s I found myself developing an operational semantics and calculus as part of my dissertation, in order to demonstrate that fexprs really are well-behaved (though I should have anticipated that some people, having been taught otherwise, would refuse to believe it even with proof). So I seized the opportunity to also elaborate my binding-delimiters approach to things that — unlike fexprs — really are side-effects.

This second innovation rather flew in the face of a tradition going back about seven or eight decades, to the invention of λ-calculus. Alonzo Church was evidently quite concerned about what variables mean; he maintained that a proposition with free variables in it doesn't have a clear meaning, and he wanted to have just one variable-binding construct, λ, whose β-rule defines the practical meanings of all variables. This tradition of having just one kind of variable, one binding construct, and one kind of variable-substitution (β-substitution) has had a powerful grip on researchers' imaginations for generations, to the point where even when other binding constructs are introduced they likely still have most of the look-and-feel of λ. My side-effect-ful variable binders are distinctly un-λ-like, with rewriting rules, and substitution functions, bearing no strong resemblance to the β-rule. Freedom from the β mold had the gratifying effect of allowing much simpler rewriting rules for moving upward through a term, without the major perturbations suggested by the term "bubbling up"; but, unsurprisingly, the logistics of a wild profusion of new classes of variables were not easy to work out. Much elegant mathematics surrounding λ-calculus rests squarely on the known simple properties of its particular take on variable substitution. The chapter of my dissertation that grapples with the generalized notion of substitution (Chapter 13, "Substitutive Reduction Systems", for anyone keeping score) has imho appallingly complicated foundations, although the high-level theorems at least are satisfyingly powerful. One thing that did work out neatly was enforcement of variable hygiene, which in ordinary λ-calculus is handled by α-renaming. In order to apply any nontrivial term-rewriting rule without disaster, you have to first make sure there aren't some two variables using the same name whose distinction from each other would be lost during the rewrite. It doesn't matter, really, what sort of variables are directly involved in the rewrite rule: an unhygienic rewrite could mess up variables that aren't even mentioned by the rule. Fortunately, it's possible to define a master α-renaming function that recurses through the term renaming variables to maintain hygiene, and whenever you add a new sort of variable to the system, just extend the master function with particular cases for that new sort of variable. Each rewriting rule can then invoke the master function, and everything works smoothly.

I ended up with four classes of variables. "Ordinary" variables, of the sort supported by λ-calculus, I found were actually wanted only for a specific (and not even technically necessary) purpose: to support partial evaluation. You could build the whole calculus without them and everything would work right, but the equational theory would be very weak. (I blogged on this point in detail here.) A second class of variable supported continuations; in effect, the side-effect marker was a "throw" and the binding construct was a "catch". Mutable state was more complicated, involving two classes of variables, one for assignments and one for lookups. The variables for assignment were actually environment identities; each assignment side-effect would then specify a value, a symbol, and a free variable identifying the environment. The variables for lookup stood for individual environment-symbol queries; looking up a symbol in an environment would generate queries for that environment and each of its ancestor environments. The putative result of the lookup would be a leaf subterm with free variable occurrences for all the queries involved, waiting to assimilate the query results, while the queries themselves would rise through the term in search of matching assignments. Whenever a query found a matching assignment, it would self-annihilate while using substitution to report the result to all waiting free variable occurrences.

Does all this detail matter to the analogy with physics? Well, that's the question, isn't it. There's a lot there, a great deal of fodder to chew on when considering how an analogy with something else might have a structural basis.

Analogy

Amongst the four classes of variables, partial-evaluation variables have a peculiar sort of... symmetry. If you constructed a vau-calculus with, say, only continuation variables, you'd still have two different substitution functions — one to announce that a delimiting "catch" has been moved upward, and one for α-renaming. If you constructed a vau-calculus with only mutable-state variables, you'd have, well, a bunch of substitution functions, but in particular all the substitutions used to enact rewriting operations would be separate from α-renaming. β-substitution, though, is commensurate with α-renaming; once you've got β-substitution of partial-evaluation variables, you can use it to α-rename them as well, which is why ordinary λ-calculus has, apparently at least, only one substitution function.

Qualitatively, partial-evaluation variables seem more integrated into the fabric of the calculus, in contrast to the other classes of variables.

All of which put me powerfully in mind of physics because it's a familiar observation that gravity seems qualitatively more integrated into the fabric of spacetime, in contrast to the other fundamental forces (xkcd). General relativity portrays gravity as the shape of spacetime, whereas the other forces merely propagate through spacetime, and a popular strategy for aspiring TOEs (Theories Of Everything) is to integrate the other fundamental forces into the geometry as well — although, looking at the analogy, perhaps that popular strategy isn't such a good idea after all. Consider: The analogy isn't just between partial-evaluation variables and gravity. It's between the contrast of partial-evaluation variables against the other classes of variables, and the contrast of gravity against the other fundamental forces. All the classes of variables, and all the fundamental forces, are to some extent involved. I've already suggested that Felleisen's treatment of side-effects was both weakened and complicated by its too-close structural imitation of λ, whereas a less λ-like treatment of side-effects can be both stronger and simpler; so, depending on how much structure carries through the analogy, perhaps trying to treat the other fundamental forces too much like gravity should be expected to weaken and complicate a TOE.

Projecting through the analogy suggests alternative ways to structure theories of physics, which imho is worthwhile independent of whether the analogy is deep or shallow; as I've remarked before, I actively look for disparate ways of thinking as a broad base for basic research. The machinery of calculus variable hygiene, with which partial-evaluation variables have a special affinity, is only one facet of term structure; and projecting this through to fundamental physics, where gravity has a special affinity with geometry, this suggests that geometry itself might usefully be thought of, not as the venue where physics takes place, but merely as part of the rules by which the game is played. Likewise, the different kinds of variables differ from each other by the kinds of structural transformations they involve; and projecting that through the analogy, one might try to think of the fundamental forces as differing from each other not (primarily) by some arbitrary rules of combination and propagation, but by being different kinds of structural manipulations of reality. Then, if there is some depth to the analogy, one might wonder if some of the particular technical contrasts between different classes of variables might be related to particular technical contrasts between different fundamental forces — which, frankly, I can't imagine deciphering until and unless one first sets the analogy on a solid technical basis.

I've speculated several times on this blog on the role of non-locality in physics. Bell's Theorem says that the statistical distribution of quantum predictions cannot be explained by any local, deterministic theory of physics if, by 'local and deterministic', you mean 'evolving forward in time in a local and deterministic way'; but it's quite possible to generate this same statistical distribution of spacetime predictions using a theory that evolves locally and deterministically in a fifth dimension orthogonal to spacetime. Which strikes a familiar chord through the analogy with calculus variables, because non-locality is, qualitatively at least, the defining characteristic of what we mean by "side-effects", and the machinery of α-renaming maintains hygiene for these operations exactly by going off and doing some term rearranging on the side (as if in a separate dimension of rewriting that we usually don't bother to track). Indeed, thought of this way, a "variable" seems to be an inherently distributed entity, spread over a region of the term — called its scope — rather than located at a specific point. A variable instance might appear to have a specific location, but only because we look at a concrete syntactic term; naturally we have to have a particular concrete term in order to write it down, but somehow this doesn't seem to do justice to the reality of the hygiene machinery. One could think of an equivalence class of terms under α-renaming, but imho even that is a bit passive. The reality of a variable, I've lately come to think, is a dynamic distributed entity weaving through the term, made up of the binding construct (such as a λ), all the free instances within its scope, and the living connections that tie all those parts together; I imagine if you put your hand on any part of that structure you could feel it humming with vitality.

Metatime

To give a slightly less hand-wavy description of my earlier post on Bell's Theorem — since it is the most concrete example we have to inform our view of the analogy on the physics side:

Bell looked at a refinement of the experiment from the EPR paradox. A device emits two particles with entangled spin, which shoot off in opposite directions, and their spins are measured by oriented detectors at some distance from the emitter. The original objection of Einstein Podolsky and Rosen was that the two measurements are correlated with each other, but because of the distance between the two detectors, there's no way for information about either measurement to get to where the other measurement takes place without "spooky action at a distance". Bell refined this objection by noting that the correlation of spin measurements depends on the angle θ between the detectors. If you suppose that the orientations of the detectors at measurement is not known at the time and place where the particles are emitted, and that the outcomes of the measurements are determined by some sort of information — "hidden variable" — propagating from the emission event at no more than the speed of light, then there are limits (called Bell's Inequality) on how the correlation can be distributed as a function of θ, no matter what the probability distribution of the hidden variable. The distribution predicted by quantum mechanics violates Bell's Inequality; so if the actual probability distribution of outcomes from the experiment matches the quantum mechanical prediction, we're living in a world that can't be explained by a local hidden-variable theory.

My point was that this whole line of reasoning supposes the state of the world evolves forward in time. If it doesn't, then we have to rethink what we even mean by "locality", and I did so. Suppose our entire four-dimensional reality is generated by evolving over a fifth dimension, which we might as well call "metatime". "Locality" in this model means that information about the state of one part of spacetime takes a certain interval of metatime to propagate a certain distance to another part of spacetime. Instead of trying to arrange the probability distribution of a hidden variable at the emission event so that it will propagate through time to produce the desired probability distribution of measurements — which doesn't work unless quantum mechanics is seriously wrong about this simple system — we can start with some simple, uniform probability distribution of possible versions of the entire history of the experiment, and by suitably arranging the rules by which spacetime evolves, we can arrange that eventually spacetime will settle into a stable state where the probability distribution is just what quantum mechanics predicts. In essence, it works like this: let the history of the experiment be random (we don't need nondeterminism here; this is just a statement of uniformly unknown initial conditions), and suppose that the apparent spacetime "causation" between the emission and the measurements causes the two measurements to be compared to each other. Based on θ, let some hidden variable decide whether this version of history is stable; and if it isn't stable, just scramble up a new one (we can always do that by pulling it out of the uniform distribution of the hidden variable, without having to posit fundamental nondeterminism). By choosing the rule for how the hidden variable interacts with θ, you can cause the eventual stable history of the experiment to exhibit any probability distribution you choose.

That immense power is something to keep a cautious eye on: not only can this technique produce the probability distribution predicted by quantum mechanics, it can produce any other probability distribution as well. So, if the general structure of this mathematical theory determines something about the structure of the physical reality it depicts, what it determines is apparently not, in any very straightforward fashion, that probability distribution.

Transformations

The side of the analogy we have prior detailed structural knowledge about is the vau-calculus side. Whatever useful insights we may hope to extract from the metatime approach to Bell's Theorem, it's very sketchy compared to vau-calculus. So if we want to work out a structural pattern that applies to both sides of the analogy, it's plausible to start building from the side we know about, questioning and generalizing as we go along. To start with,

Suppose we have a complex system, made up of interconnected parts, evolving by some sort of transformative steps according to some simple rules.

Okay, freeze frame. Why should the system be made up of parts? Well, in physics it's (almost) always the parts we're interested in. We ourselves are, apparently, parts of reality, and we interact with parts of reality. Could we treat the whole as a unit and then somehow temporarily pull parts out of it when we need to talk about them? Maybe, but the form with parts is still the one we're primarily interested in. And what about "transformative steps"; do we want discrete steps rather than continuous equations? Actually, yes, that is my reading of the situation; not only does fundamental physics appear to be shot through with discreteness (I expanded on this point a while back), but the particular treatment I used for my metatime proof-of-concept (above) used an open-ended sequence of discrete trials to generate the requisite probability distribution. If a more thoroughly continuous treatment is really wanted, one might try to recover continuity by taking a limit a la calculus.

Suppose we separate the transformation rules into two groups, which we call bookkeeping rules and operational rules; and suppose we have a set of exclusive criteria on system configurations, call these hygiene conditions, which must be satisfied before any operational rule can be applied.

Freeze again. At first glance, this looks pretty good. From any unhygienic configuration, we can't move forward operationally until we've done bookkeeping to ensure hygiene. Both calculus rewriting and the metatime proof-of-concept seemingly conform to this pattern; but the two cases differ profoundly in how their underlying hygiene (supposing that's what it is, in the physics case) affects the form of the modeled system, and we'll need to consider the difference carefully if we mean to build our speculations on a sound footing.

Determinism and rewriting

Hygiene in rewriting is all about preserving properties of a term (to wit, variable instance–binding correspondences), whereas our proof-of-concept metatime transformations don't appear to be about perfectly preserving something but rather about shaping probability distributions. One might ask whether it's possible to set up the internals of our metatime model so that the probability distribution is a consequence, or symptom, of conserving something behind the scenes. Is the seemingly nondeterministic outcome of our quantum observation in a supposedly small quantum system here actually dictated by the need to maintain some cosmic balance that can't be directly observed because it's distributed over a ridiculously large number of entities (such as the number of electrons in the universe)? That could lead to some bracing questions about how to usefully incorporate such a notion into a mathematical theory.

As an alternative, one might decide that the probability distribution in the metatime model should not be a consequence of absolutely preserving a condition. There are two philosophically disparate sorts of models involving probabilities: either the probability comes from our lack of knowledge (the hidden-variable hypothesis), and in the underlying model the universe is computing an inevitable outcome; or the probability is in the foundations (God playing dice, in the Einsteinian phrase), and in the underlying model the universe is exploring the range of possible outcomes. I discussed this same distinction, in another form, in an earlier post, where it emerged as the defining philosophical distinction between a calculus and a grammar (here). In those terms, if our physics model is fundamentally deterministic then it's a calculus and by implication has that structural affinity with the vau-calculi on the other side of the analogy; but if our physics model is fundamentally nondeterministic then it's a grammar, and our analogy has to try to bridge that philosophical gap. Based on past experience, though, I'm highly skeptical of bridging the gap; if the analogy can be set on a concrete technical basis, the TOE on the physics side seems to me likely to be foundationally deterministic.

The foundationally deterministic approach to probability is to start with a probabilistic distribution of deterministic initial states, and evolve them all forward to produce a probabilistic distribution of deterministic final states. Does the traditional vau-calculus side of our analogy, where we have so much detail to start with, have anything to say about this? In the most prosaic sense, ones suspects not; probability distributions don't traditionally figure into deterministic computation semantics, where this approach would mean considering fuzzy sets of terms. There may be some insight lurking, though, in the origins of calculus hygiene.

When Alonzo Church's 1932/3 formal logic turned out to be inconsistent, he tried to back off and find some subset of it that was provably consistent. Here consistent meant that not all propositions are equivalent to each other, and the subset of the logic that he and his student J. Barkley Rosser proved consistent in this sense was what we now call λ-calculus. The way they did it was to show that if any term T₁ can be reduced in the calculus in two different ways, as T₂ and T₃, then there must be some T₄ that both of them can be reduced to. Since logical equivalence of terms is defined as the smallest congruence generated by the rewriting relation of the calculus, from the Church-Rosser property it follows that if two terms are equivalent, there must be some term that they both can be reduced to; and therefore, two different irreducible terms cannot possibly be logically equivalent to each other.

Proving the Church-Rosser theorem for λ-calculus was not, originally, a simple matter. It took three decades before a simple proof began to circulate, and the theorem for variant calculi continues to be a challenge. And this is (in one view of the matter, at least) where hygiene comes into the picture. Church had three major rewriting rules in his system, later called the α, β, and γ rules. The α rule was the "bookkeeping" rule; it allowed renaming a bound variable as long as you don't lose its distinction from other variables in the process. The β rule is now understood as the single operational rule of λ-calculus, how to apply a function to an argument. The γ rule is mostly forgotten now; it was simply doing a β-step backward, and was later dropped in favor of starting with just α and β and then taking the congruent closure (reflexive, symmetric, transitive, and compatible). Ultimately the Church-Rosser theorem allows terms to be sorted into β-equivalence classes; but the terms in each class aren't generally thought of as "the same term", just "equivalent terms". α-equivalent terms, though, are much closer to each other, and for many purposes would actually be thought of as "the same term, just written differently". Recall my earlier description of a variable as a distributed entity, weaving through the term, made up of binding construct, instances, and living connections between them. If you have a big term, shot through with lots of those dynamic distributed entities, the interweaving could really be vastly complex. So factoring out the α-renaming is itself a vast simplification, which for a large term may dwarf what's left after factoring to complete the Church-Rosser proof. To see by just how much the bookkeeping might dwarf the remaining operational complexity, imagine scaling the term up to the sort of cosmological scope mentioned earlier — like the number of electrons in the universe.

It seems worth considering, that hygiene may be a natural consequence of a certain kind of factoring of a vastly interconnected system: you sequester almost all of the complexity into bookkeeping with terrifically simple rules applied on an inhumanly staggering scale, and comparatively nontrivial operational rules that never have to deal directly with the sheer scale of the system because that part of the complexity was factored into the bookkeeping. In that case, at some point we'll need to ask when and why that sort of factoring is possible. Maybe it isn't really possible for the cosmos, and a flaw in our physics is that we've been trying so hard to factor things this way; when we really dive into that question we'll be in deep waters.

It's now no longer clear, btw, that geometry corresponds quite directly to α-renaming. There was already some hint of that in the view of vau-calculus side-effects as "non-local", which tends to associate geometry with vau-calculus term structure rather than α-renaming as such. Seemingly, hygiene is then a sort of adjunct to the geometry, something that allows the geometry to coexist with the massive interconnection of the system.

But now, with massive interconnection resonating between the two sides of the analogy, it's definitely time to ask some of those bracing questions about incorporating cosmic connectivity into a mathematical theory of physics.

Nondeterminism and the cosmic footprint

We want to interpret our probability distribution as a footprint, and reconstruct from it the massively connected cosmic order that walked there. Moreover, we're conjecturing that the whole system is factorable into bookkeeping/hygiene on one hand(?), and operations that amount to what we'd ordinarily call "laws of physics" on the other; and we'd really like to deduce, from the way quantum mechanics works, something about the nature of the bookkeeping and the factorization.

Classically, if we have a small system that's acted on by a lot of stuff we don't know about specifically, we let all those influences sum to a potential field. One might think of this classical approach as a particular kind of cosmological factorization in which the vast cosmic interconnectedness is reduced to a field, so one can then otherwise ignore almost everything to model the behavior of the small system of interest using a small operational set of physical laws. We know the sort of cosmic order that reduces that way, it's the sort with classical locality (relative to time evolution); and the vaster part of the factorization — the rest of the cosmos, that reduced to a potential field — is of essentially the same kind as the small system. The question we're asking at this point is, what sort of cosmic order reduces such that its operational part is quantum mechanics, and what does its bookkeeping part look like? Looking at vau-calculus, with its α-equivalence and Church-Rosser β-equivalence, it seems fairly clear that hygiene is an asymmetric factorization: if the cosmos factors this way, the bookkeeping part wouldn't have to look at all like quantum mechanics. A further complication is that quantum mechanics may be an approximation only good when the system you're looking at is vastly smaller than the universe as a whole; indeed, this conjecture seems rather encouraged by what happens when we try to apply our modern physical theories to the cosmos as a whole: notably, dark matter. (The broad notion of asymmetric factorizations surrounding quantum mechanics brings to mind both QM's notorious asymmetry between observer and observed, and Einstein's suggestion that QM is missing some essential piece of reality.)

For this factorization to work out — for the cosmic system as a whole to be broadly "metaclassical" while factoring through the bookkeeping to either quantum mechanics or a very good approximation thereof — the factorization has to have some rather interesting properties. In a generic classical situation where one small thing is acted on by a truly vast population of other things, we tend to expect all those other things to average out (as typically happens with a classical potential field), so that the vastness of the population makes their combined influence more stable rather than less; and also, as our subsystem interacts with the outside influence and we thereby learn more about that influence, we become more able to allow for it and still further reduce any residual unpredictability of the system.

Considered more closely, though, the expectation that summing over a large population would tend to average out is based broadly on the paradigm of signed magnitudes on an unbounded scale that attenuate over distance. If you don't have attenuation, and your magnitudes are on a closed loop, such as angles of rotation, increasing the population just makes things more unpredictable. Interestingly, I'd already suggested in one of my earlier explorations of the hygiene analogy that the physics hygiene condition might be some sort of rotational constraint, for the — seemingly — unrelated reason that the primary geometry of physics has 3+1 dimension structure, which is apparently the structure of quaternions, and my current sense of quaternions is that they're the essence of rotation. Though when things converge like this, it can be very hard to distinguish between an accidental convergence and one that simply reassembles fragments of a deeper whole.

I'll have a thought on the other classical problem — losing unpredictability as we learn more about outside influences over time — after collecting some further insights on the structural dynamics of bookkeeping.

Massive interconnection

Given a small piece of the universe, which other parts of the universe does it interact with, and how?

In the classical decomposition, all interactions with other parts of the universe are based on relative position in the geometry — that is, locality. Interestingly, conventional quantum mechanics retains this manner of selecting interactions, embedding it deeply into the equational structure of its mathematics. Recall the Schrödinger equation,

iℏ ∂ Ψ

∂t
= Ĥ Ψ .

The element that shapes the time evolution of the system — the Hamiltonian function Ĥ — is essentially an embodiment of the classical expectation of the system behavior; which is to say, interaction according to the classical rules of locality. (I discussed the structure of the Schrödinger equation at length in an earlier post, yonder.) Viewing conventional QM this way, as starting with classical interactions and then tacking on quantum machinery to "fix" it, I'm put in mind of Ptolemaic epicycles, tacked on to the perfect-circles model of celestial motion to make it work. (I don't mean the comparison mockingly, just critically; Ptolemy's system worked pretty well, and Copernicus used epicycles too. Turns out there's a better way, though.)

How does interaction-between-parts play out in vau-calculus, our detailed example of hygiene at work?

The whole syntax of a calculus term is defined by two aspects: the variables — by which I mean those "distributed entities" woven through the term, each made of a binding, its bound instances, and the connections that hygiene maintains between them — and, well, everything else. Once you ignore the specific identities of all the variable instances, you've just got a syntax tree with each node labeled by a context-free syntax production; and the context-free syntax doesn't have very many rules. In λ-calculus there are only four syntax rules: a term is either a combination, or a λ-abstraction, or a variable, or a constant. Some treatments simplify this by using variables for the "constants", and it's down to only three syntax rules. The lone operational rule β,

((λx.T₁)T₂) → T₁[x ← T₂] ,

gives a purely local pattern in the syntax tree for determining when the operation can be applied: any time you have a parent node that's a combination whose left child is a λ-abstraction. Operational rules stay nearly this simple even in vau-calculi; the left-hand side of each operational rule specifies when it can be applied by a small pattern of adjacent nodes in the syntax tree, and mostly ignores variables (thanks to hygienic bookkeeping). The right-hand side is where operational substitution may be introduced. So evidently vau-calculus — like QM — is giving local proximity a central voice in determining when and how operations apply, with the distributed aspect (variables) coming into play in the operation's consequences (right-hand side).

Turning it around, though, if you look at a small subterm — analogous, presumably, to a small system studied in physics — what rules govern its non-local connections to other parts of the larger term? Let's suppose the term is larger than our subterm by a cosmically vast amount. The free variables in the subterm are the entry points by which non-local influences from the (vast) context can affect the subterm (via substitution functions). And there is no upper limit to how fraught those sorts of interconnections can get (which is, after all, what spurs advocates of side-effect-less programming). That complexity belongs not to the "laws of physics" (neither the operational nor even the bookkeeping rules), but to the configuration of the system. From classical physics, we're used to having very simple laws, with all the complexity of our problems coming from the particular configuration of the small system we're studying; and now we've had that turned on its head. From the perspective of the rules of the calculus, yes, we still have very simple rules of manipulation, and all the complexity is in the arrangement of the particular term we're working with; but from the perspective of the subterm of interest, the interconnections imposed by free variables look a lot like "laws of physics" themselves. If we hold our one subterm fixed and allow the larger term around it to vary probabilistically then, in general, we don't know what the rules are and have no upper bound on how complicated those rules might be. All we have are subtle limits on the shape of the possible influences by those rules, imposed roundaboutly by the nature of the bookkeeping-and-operational transformations.

One thing about the shape of these nonlocal influences: they don't work like the local part of operations. The substitutive part of an operation typically involves some mixture of erasing bound variable instances and copying fragments from elsewhere. The upshot is that it rearranges the nonlocal topology of the term, that is, rearranges the way the variables interweave — which is, of course, why the bookkeeping rules are needed, to maintain the integrity of the interweaving as it winds and twists. And this is why a cosmic system of this sort doesn't suffer from a gradual loss of unpredictability as the subsystem interacts with its neighbors in the nonlocal network and "learns" about them: each nonlocal interaction that affects it changes its nonlocal-network neighbors. As long as the overall system is cosmically vast compared to the the subsystem we're studying, in practice we won't run out of new network neighbors no matter how many nonlocal actions our subsystem undergoes.

This also gives us a specific reason to suspect that quantum mechanics, by relying on this assumption of an endless supply of fresh network neighbors, would break down when studying subsystems that aren't sufficiently small compared to the cosmos as a whole. Making QM (as I've speculated before), like Newtonian physics, an approximation that works very well in certain cases.

Factorization

Here's what the reconstructed general mathematical model looks like so far:

The system as a whole is made up of interconnected parts, evolving by transformative steps according to simple rules.
The interconnections form two subsystems: local geometry, and nonlocal network topology.
The transformation rules are of two kinds, bookkeeping and operational.
Operational rules can only be applied to a system configuration satisfying certain hygiene conditions on its nonlocal network topology; bookkeeping, which only acts on nonlocal network topology, makes it possible to achieve the hygiene conditions.
Operational rules are activated based on local criteria (given hygiene). When applied, operations can have both local and nonlocal effects, while the integrity of the nonlocal network topology is maintained across both kinds of effect via hygiene, hence bookkeeping.

To complete this picture, it seems, we want to consider what a small local system consists of, and how it relates to the whole. This is all the more critical since we're entertaining the possibility that quantum mechanics might be an approximation that only works for a small local system, so that understanding how a local system relates to the whole may be crucial to understanding how quantum mechanics can arise locally from a globally non-quantum TOE.

A local system consists of some "local state", stuff that isn't interconnection of either kind; and some interconnections of (potentially) both kinds that are entirely encompassed within the local system. For example, in vau-calculi — our only prior source for complete examples — we might have a subterm (λx.(y‍x)). Variables are nonlocal network topology, of course, but in this case variable x is entirely contained within the local system. The choice of variable name "x" is arbitrary, as long as it remains different from "y" (hygiene). But what about the choice of "y"? In calculus reasoning we would usually say that because y is free in this subterm, we can't touch it; but that's only true if we're interested in comparing it to specific contexts, or to other specific subterms. If we're only interested in how this subterm relates to the rest of the universe, and we have no idea what the rest of the universe is, then the free variable y really is just one end of a connection whose other end is completely unknown to us; and a different choice of free variable would tell us exactly as much, and as little, as this one. We would be just as well off with (λx.(z‍x)), or (λz.(w‍z)), or even (λy.(x‍y)) — as long as we maintain the hygienic distinction between the two variables. The local geometry that can occur just outside this subterm, in its surrounding context, is limited to certain specific forms (by the context-free grammar of the calculus); the nonlocal network topology is vastly less constrained.

The implication here is that all those terms just named are effectively equivalent to (λx.(y‍x)). One might be tempted to think of this as simply different ways of "writing" the same local system, as in physics we might describe the same local system using different choices of coordinate axes; but the choice of coordinate axes is about local geometry, not nonlocal network topology. Here we're starting with simple descriptions of local systems, and then taking the quotient under the equivalence induced by the bookkeeping operations. It's tempting to think of the pre-quotient simple descriptions as "classical" and analogize the quotient to "quantum", but in fact there is a second quotient to be taken. In the metatime proof-of-concept, the rewriting kept reshuffling the entire history of the experiment until it reached a steady state — the obvious analogy is to a calculus irreducible term, the final result of the operational rewrite relation of the calculus. And this, at last, is where Church-Rosser-ness comes in. Church-Rosser is what guarantees that the same irreducible form (if any) is reached no matter in what order operational rules are applied. It's the enabler for each individual state of the system to evolve deterministically. To emphasize this point: Church-Rosser-ness applies to an individual possible system state, thus belongs to the deterministic-foundations approach to probabilities. The probability distribution itself is made up of individual possibilities each one of which is subject to Church-Rosser-ness. (Church-Rosser-ness is also, btw, a property of the mathematical model: one doesn't ask "Why should these different paths of system state evolution all come back together to the same normal form?", because that's simply the kind of mathematical model one has chosen to explore.)

The question we're trying to get a handle on is why the nonlocal effects of some operational rules would appear to be especially compatible with the bookkeeping quotient of the local geometry.

Side-effects

In vau-calculi, the nonlocal operational effects (i.e., operational substitution functions) that do not integrate smoothly with bookkeeping (i.e., with α-renaming) are the ones that support side-effects; and the one nonlocal operational effect that does integrate smoothly with bookkeeping — β-substitution — supports partial evaluation and turns out to be optional, in the sense that the operational semantics of the system could be described without that kind of nonlocal effect and the mathematics would still be correct with merely a weaker equational theory.

This suggests that in physics, gravity could be understood without bringing nonlocal effects into it at all, though there may be some sort of internal mathematical advantage to bringing them in anyway; while the other forces may be thought of as, in some abstract sense, side-effects.

So, what exactly makes a side-effect-ful substitution side-effect-ful? Conversely, β-substitution is also a form of substitution; it engages the nonlocal network topology, reshuffling it by distributing copies of subterms, the sort of thing I speculated above may be needed to maintain the unpredictability aspect of quantum mechanics. So, what makes β-substitution not side-effect-ful in character? Beyond the very specific technicalities of β- and α-substitution; and just how much, or little, should we be abstracting away from those technicalities? I'm supposing we have to abstract away at least a bit, on the principle that physics isn't likely to be technically close to vau-calculi in its mathematical details.

Here's a stab at a sufficient condition:

A nonlocal operational effect is side-effect-ful just if it perturbs the pre-existing local geometry.

The inverse property, called "purity" in a programming context (as in "pure function"), is that the nonlocal operational effect doesn't perturb the pre-existing local geometry. β-substitution is pure in this sense, as it replaces a free variable-instance with a subterm but doesn't affect anything local other than the variable-instance itself. Contrast this with the operational substitution for control variables; the key clauses (that is, nontrivial base cases) of the two substitutions are

x[x ← T] → T .
(τx.T)[x ← C] → (τx.C[T[x ← C]]) .

The control substitution alters the local-geometric distance between pre-existing structure T and whatever pre-existing immediate context surrounds the subterm acted on. Both substitutions have the — conjecturally important — property that they substantially rearrange the nonlocal network topology by injecting arbitrary new network connections (that is, new free variables). The introduction of new free variables is a major reason why vau-calculi need bookkeeping to maintain hygiene; although, interestingly, it's taken all this careful reasoning about bookkeeping to conclude that bookkeeping isn't actually necessary to the notion of purity/impurity (or side-effect-ful/non-side-effect-ful); apparently, bookkeeping is about perturbations of the nonlocal network topology, whereas purity/impurity is about perturbations of the local geometry. To emphasize the point, one might call this non-perturbation of local geometry co-hygiene — all the nonlocal operational effects must be hygienic, which might or might not require bookkeeping depending on internals of the mathematics, but only the β- and gravity nonlocal effects are co-hygienic.

Co-hygiene

Abstracting away from how we got to it, here's what we have:

A complex system of parts, evolving through a Church-Rosser transformation step relation.
Interconnections within a system state, partitioned into local (geometry) and nonlocal (network).
Each transformation step is selected locally.
The nonlocal effects of each transformation step rearrange — scramble — nonlocal connections at the locus where applied.
Certain transformative operations have nonlocal effects that do not disrupt pre-existing local structure — that are co-hygienic — and thereby afford particularly elegant description.

What sort of elegance is involved in the description of a co-hygienic operation depends on the technical character of the mathematical model; for β-reduction, what we've observed is functional compatibility between β- and α-substitution, while for gravity we've observed the general-relativity integration between gravity and the geometry of spacetime.

So my proposed answer to the conundrum I've been pondering is that the affinity between gravity and geometry suggests a modeling strategy with a nonlocal network pseudo-randomly scrambled by locally selected operational transformations evolving toward a stable state of spacetime, in which the gravity operations are co-hygienic. A natural follow-on question is just what sort of mathematical machinery, if any, would cause this network-scrambling to approximate quantum mechanics.

On the side, I've got intimations here that quantum mechanics may be an approximation induced by the pseudo-random network scrambling when the system under study is practically infinitesimal compared to the cosmos as a whole, and perhaps that the network topology has a rotational aspect.

Meanwhile, an additional line of possible inquiry has opened up. All along I've been trying to figure out what the analogy says about physics; but now it seems one might study the semantics of a possibly-side-effect-ful program fragment by some method structurally akin to quantum mechanics. The sheer mathematical perversity of quantum mechanics makes me skeptical that this could be a practical approach to programming semantics; on the other hand, it might provide useful insights for the TOE mathematics.

Epilog: hygiene

So, what happened to hygiene? It was a major focus of attention through nearly the whole investigation, and then dropped out of the plot near the end.

At its height of prestige, when directly analogized to spacetime geometry (before that fell through), hygiene motivated the suggestion that geometry might be thought of not as the venue where physics happens but merely as part of its rules. That suggestion is still somewhat alive since the proposed solution treats geometry as abstractly just one of the two forms of interconnection in system state, though there's a likely asymmetry of representation between the two forms. There was also some speculation that understanding hygiene on the physics side could be central to making sense of the model; that, I'd now ease off on, but do note that in seeking a possible model for physics one ought to keep an eye out for a possible bookkeeping mechanism, and certainly resolving the nonlocal topology of the model would seem inseparable from resolving its bookkeeping. So hygiene isn't out of the picture, and may even play an important role; just not with top billing.

Would it be possible for the physics model to be unhygienic? In the abstract sense I've ended up with, lack of hygiene would apparently mean an operation whose local effect causes nonlocal perturbation. Whether or not dynamic scope qualifies would depend on whether dynamically scoped variables are considered nonlocal; but since we expect some nonlocal connectivity, and those variables couldn't be perturbed by local operations without losing most/all of their nonlocality, probably the physics model would have to be hygienic. My guess is that if a TOE actually tracked the nonlocal network (as opposed to, conjecturally, introducing a quantum "blurring" as an approximation for the cumulative effect of the network), the tracking would need something enough like calculus variables that some sort of bookkeeping would be called for.

Posted by John Shutt at 3:21 PM 4 comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: determinism, discrete mathematics, general relativity, lambda calculus, locality, mathematical physics, quantum mechanics, Theory of Everything, vau calculus