Showing posts with label scheme. Show all posts
Showing posts with label scheme. Show all posts

Tuesday, October 28, 2025

A Method for Implementing First-Class Continuations on the JVM and CLR (AI assisted)

For this complex topic I needed some help. I explained the process to an AI and had it help me write this blog post. Questions and comments are welcome.

Managed runtimes like the Java Virtual Machine (JVM) and the Common Language Runtime (CLR) provide robust, high-performance environments for software execution. A key feature of these platforms is a rigidly structured call stack, which manages function calls and returns in a strict last-in, first-out (LIFO) order. While this model is efficient and simplifies memory management, it precludes certain powerful control flow constructs, most notably first-class continuations.

A first-class continuation is the reification of the current point of execution—essentially, "the rest of the program"—as an object that can be stored, passed around, and invoked. Invoking a continuation effectively discards the current execution stack and replaces it with the captured one. This document details a methodology for implementing such a mechanism within an interpreter running on a managed runtime, circumventing the limitations of the native call stack.

This document provides a comprehensive technical overview of a method for implementing first-class continuations within an interpreter executing on a managed runtime, such as the JVM or CLR. These platforms enforce a strict, stack-based execution model that is incompatible with the control-flow manipulations required for first-class continuations. The technique described herein circumvents this limitation by creating a custom, manually-managed execution model based on a trampoline and a universal "step" contract, enabling the capture, storage, and invocation of the program's execution state.

1. The Core Execution Architecture

The foundation of this system is an interpreter where every evaluatable entity—from primitive operations to user-defined functions—adheres to a single, uniform execution contract. This approach abstracts execution away from the host's native call stack.

1.1. The `Step` Method

All computable objects implement a `Step` method. This method performs one atomic unit of computation. Its precise signature is critical to the entire mechanism:

bool Step(out object ans, ref IControl ctl, ref IEnvironment env)

1.2. The Interpreter Registers

The parameters of the Step method function as the registers of our virtual machine. Their specific modifiers are essential:

  • out object ans: The Answer Register. This is an output parameter used to return the final value of a computation.
  • ref IControl ctl: The Control Register. This reference parameter holds a pointer to the next computational object (`IControl`) to be executed.
  • ref IEnvironment env: The Environment Register. This reference parameter holds the context necessary for the execution of the control object, such as lexical variable bindings.

The use of reference (ref) and output (out) parameters is the key that allows a callee function to directly modify the state of its caller's execution loop, which is fundamental to achieving tail calls and other advanced control transfers.

1.3. The Four Modes of Control Transfer

A Step method executes its atomic portion of work and then relinquishes control in one of four distinct ways:

  1. Deeper Call: To obtain a required value, it can directly invoke the Step method of a callee function, initiating a deeper, nested computation.
  2. Value Return: It can conclude its computation by setting the ans parameter to its result value and returning false. The false return value signals to the caller that a value has been produced and normal execution can proceed.
  3. Tail Call: It can perform a tail call by setting the ctl parameter to the callee and the env parameter to the callee's required environment, and then returning true. The true return value signals to the caller's execution loop that it should not proceed, but instead immediately re-execute with the new ctl and env values.
  4. Unwind Participation: It can participate in a stack unwind event, a special protocol for capturing the continuation, which will be discussed in detail below.

2. The Trampoline: Enabling Tail Recursion

To avoid consuming the native call stack and prevent stack overflow exceptions during deep recursion, we employ a trampoline. This is a controlling loop that manages the execution of Step methods.

// Variables to hold the current state
IControl control = ...;
IEnvironment environment = ...;
object answer;
// The trampoline loop
while (control.Step(out answer, ref control, ref environment)) {}
// Execution continues here after a normal return (false)

The operation is as follows: When a callee wishes to tail call, it mutates the control and environment variables through the ref parameters and returns true. The while loop's condition evaluates to true, its (empty) body executes, and the loop condition is evaluated again, this time invoking the Step method on the newly specified control object. When a callee returns a value, it mutates the answer variable via the out parameter and returns false. This terminates the loop, and the ultimate value of the call is available in the answer variable.

3. The Unwind Protocol: Capturing the Continuation

The continuation is captured by hijacking the established return mechanism. This is a cooperative process that propagates upward from the point of capture.

3.1. Unwind Initiation

A special function (e.g., the primitive for `call/cc`) initiates the capture. It sets the answer register to a magic constant (e.g., `UNWIND`) and mutates the environment register to hold a new `UnwinderState` object, which will accumulate the stack frames. It then returns false, causing its immediate caller's trampoline to exit.

3.2. Unwind Participation and Propagation

Crucially, every call site must check for the unwind signal immediately after its trampoline loop terminates.

while (control.Step(out answer, ref control, ref environment)) { };
if (answer == MagicValues.UNWIND) {
 // An unwind is in progress. We must participate.
 // 1. Create a Frame object containing all necessary local state
 // to resume this function from this point.
 Frame resumeFrame = new Frame(this.localState1, this.localState2, ...);
 // 2. Add the created frame to the list being accumulated.
 ((UnwinderState)environment).AddFrame(resumeFrame);
 // 3. Propagate the unwind to our own caller. Since this code is
 // inside our own Step method, we have access to our caller's
 // registers via our own parameters. We set *their* answer to UNWIND
 // and *their* environment to the UnwinderState, and return false
 // to drop *their* trampoline.
 return false; // Assuming 'ans' and 'env' are our own out/ref parameters.
}

This process creates a chain reaction. Each function up the conceptual call stack catches the unwind signal, preserves its own state in a Frame object, adds it to the list, and then triggers its own caller to unwind. This continues until the top-level dispatch loop is reached.

4. The Top-Level Dispatch Loop

The main entry point of the interpreter requires a master loop that can handle the three possible outcomes of an unwind event.

while (true) {
 answer = null;
 while (control.Step(out answer, ref control, ref environment)) { };
 if (answer == MagicValues.UNWIND) {
 UnwinderState unwindState = (UnwinderState)environment;
 // Outcome 3: The unwind was an instruction to exit the interpreter.
 if (unwindState.IsExit) {
 answer = unwindState.ExitValue;
 break;
 }
 else {
 // Outcome 1 & 2: A continuation was captured (cwcc) or is being invoked.
 // In either case, we must restore a control point.
 ControlPoint stateToRestore = unwindState.ToControlPoint();
 IControl receiver = unwindState.Receiver;
 // The RewindState holds the list of frames to be reloaded.
 environment = new RewindState(stateToRestore, receiver);
 control = ((RewindState)environment).PopFrame();
 }
 } else {
 // Normal termination of the entire program
 break;
 }
}
// Interpreter has exited.
return answer;

This top-level handler serves as the central arbiter. It runs the normal trampoline, but if an unwind reaches it, it inspects the UnwinderState to determine whether to exit the program entirely or to begin a rewind process to install a new (or previously captured) execution stack.

5. The Rewind Protocol: Restoring the Continuation

Invoking a continuation involves rebuilding the captured stack. This is managed by the `RewindState` environment and the `Step` methods of the captured `Frame` objects.

5.1. The `Frame` `Step` Method: A Dual Responsibility

The `Step` method for a `Frame` object being restored is complex. Its primary responsibility is to first restore the part of the stack that was deeper than itself. It does this by calling `PopFrame` on the `RewindState` to get the next frame and then running a local trampoline on it. The code that represents its own original pending computation is encapsulated in a separate `Continue` method.

// Simplified Step method for a Frame during rewind.
public override bool Step(out object answer, ref IControl control, ref IEnvironment environment)
{
 // First, set up and run a trampoline for the deeper part of the stack.
 object resultFromDeeperCall;
 IControl deeperFrame = ((RewindState)environment).PopFrame();
 IEnvironment rewindEnv = environment;
 while (deeperFrame.Step(out resultFromDeeperCall, ref deeperFrame, ref rewindEnv)) { };
 // Check if a NEW unwind occurred during the rewind of the deeper frame.
 if (resultFromDeeperCall == MagicValues.UNWIND) {
 // If so, we must participate again. Append our remaining frames to
 // the new UnwinderState and propagate the new unwind upwards.
 ((UnwinderState)rewindEnv).AppendContinuationFrames(this.myRemainingFrames);
 environment = rewindEnv;
 answer = MagicValues.UNWIND;
 return false;
 }
 // If the deeper call completed normally, now we can execute our own pending work.
 control = this.originalExpression;
 environment = this.originalEnvironment;
 return Continue(out answer, ref control, ref environment, resultFromDeeperCall);
}

This structure ensures that the stack is rebuilt in the correct order and that the system can gracefully handle a new continuation capture that occurs while a previous one is still being restored.

5.2. Terminating the Rewind: The `CWCCFrame`

The rewind chain must end. The innermost frame of a captured continuation corresponds to the `call/cc` primitive itself. Its `Step` method does not reload any deeper frames. Its sole purpose is to invoke the continuation receiver—the lambda function that was passed to `call/cc`—and provide it with the fully reified continuation object.

public override bool Step(out object answer, ref IControl control, ref IEnvironment environment)
{
 // The rewind is complete. Deliver the continuation to the waiting function.
 ControlPoint continuation = ((RewindState)environment).ControlPoint;
 return this.receiver.Call(out answer, ref control, ref environment, continuation);
}

With this final call, the stack is fully restored, the RewindState is discarded, and normal execution resumes within the receiver function, which now holds a reference to "the rest of the program" as a callable object.

Tuesday, March 4, 2025

Collate / index-list

I was talking to Arthur Gleckler last night and he mentioned that he had been making good use of a function he called index-list. This function takes two selector functions and a list of objects. The first selector extracts a key from each object, and the second selector extracts a value. A table is returned that maps the keys to a list of all the values that were associated with that key.

I had to laugh. I had written the same function a few month back. I called it collate.

Here is Arthur’s version in Scheme:

(define (index-list elements table choose-data choose-key)
 "Given a hash table ‘table’, walk a list of ‘elements’ E, using
‘choose-key’ to extract the key K from each E and ‘choose-data’ to
extract a list of data D from each E. Store each K in ‘table’ along
with a list of all the elements of all the D for that K."
 (do-list (e elements)
 (hash-table-update!/default
 table
 (choose-key e)
 (lambda (previous) (append (choose-data e) previous))
 ’()))
 table)

And here is my version in Common Lisp:

(defun collate (list &key (key #’car) (test #’eql)
 (merger (merge-adjoin :test #’eql)) (default nil))
 (let ((table (make-hash-table :test test)))
 (dolist (element list table)
 (let ((key (funcall key element)))
 (setf (gethash key table)
 (funcall merger (gethash key table default) element))))))

So how do they differ?

  • Arthur’s version takes the hash table as a parameter. This allows the caller to control the hash table’s properties. My version creates a hash table using the test parameter, which defaults to eql.
  • Arthur’s version uses choose-key to extract the key from each element. My version uses key, which is a keyword parameter defaulting to car. My choice was driven by the convention of Common Lisp sequence functions to take a :key parameter.
  • Arthur’s version uses a default value of ’() for the entries in the hash table. My version uses the :default keyword argument that defaults to ’().
  • Arthur’s version uses choose-data to extract the datum in each element. My version uses the :merger keyword argument to specify how to merge the entire element into the table. If you only want a subfield of the element, you can compose a selector function with a merger function.
  • Arthur’s version uses append to collect the data associated with each element. My version uses a merger function to merge the element into the entry in the hash table. The default merger is merge-adjoin, which uses adjoin to add the element to the list of elements associated with the key. merge-adjoin is paramterized by a test function that defaults to eql. If the test is true, the new data is not merged, so the result of (merge-adjoin #’eql) is a list with no duplicates.
  • If you instead specify a default of 0 and a merger of (lambda (existing new) (+ existing 1)) you get a histogram.
  • Another merger I make use of is merge-unique, which ensures that all copies of the data being merged are the same, raising a warning if they are not.
  • Finally, I occasionally make use of a higher-order merger called merge-list that takes a list of mergers and applies them elementwise to two lists to be merged. This allows you to create a singleton aggregate merged element where the subfields are merged using different strategies.

Like Arthur, I found this to be a very useful function. I was auditing a data set obtained from GitHub. It came in as a flat list of records of users. Each record was a list of GitHub org, GitHub ID, and SAML/SSO login. Many of our users inadvertently have multiple GitHub IDs associated with their accounts. I used my collate function to create a table that mapped SAML/SSO login to a list of all the GitHub IDs associated with that login, and the list of orgs where that mapping applies.

Thursday, January 2, 2025

Scheme Interpreter: Conclusions

This experiment with writing an MIT-Scheme S-code interpreter in C# was successful in these ways:

  • It showed that the S-code interpreter is an independent component of the Scheme system. The interpreter substrate can be replaced with a new implementation, written in a different language, using a different evaluation strategy, without replacing the Scheme runtime system written in Scheme.
  • It showed that the S-code interpreter can, on small segments of code, perform as fast as compiled code. However, growing the size of these small segment causes an exponential increase in the number of interpreter specializations. The obvious solution of automatically generating interpreter specializations on demand is the equivalent of JIT compilation.
  • It validated the idea that the lexical environment can be represented as a flattened vector of values. Mutable variables can be implemented by cell conversion. Variable values are copied from outer scopes to inner scopes when closures are created. The semantics of such an implementation is equivalent to the semantics of a nested series of frames as used in MIT-CScheme.
  • It showed that you can implement tail recursion via trampolines at each call site, and that you can implement first-class continuations by testing for a magic return value after the return of each trampoline. We don’t use the C# exception handling mechanism to unwind the stack when implementing first-class continuations, just a conditional branch and a normal return. This is far less complicated and expensive.

It was a failure in these ways:

  • Although it showed one way in which we could take incremental steps to increase the speed of the interpreter until it approached the speed of compiled code, each step resulted in an exponential increase in the number of specializations in the interpreter and had diminishing returns.
  • The ultimate outcome of this process would be an interpreter with thousands of specializations. Small Scheme programs could be completely represented by a single specialization, and they would be interpreted as fast as compiled code. But this is because the specialization is eessentially a compiled version of the Scheme program. In other words, we ultimately will have an interpreter that “optimizes” by looking up a program in a huge table that maps small programs to their precomputed compiled bodies. This is just an unusual and inefficient way to implement a compiler.
  • Because C# offers no way to dump a the heap in binary format, we must cold load the system each time we start it.
  • One of the tasks in the cold load is to initialize the unicode tables. These are big tables that take a long time to initialize.
  • It took an annoyingly long time to get to Scheme’s initial top-level prompt.
  • Debugging crashes in the Scheme system was annoying and slow because we have to cold load the Scheme system to reproduce bugs.
  • I have understated a large component of the work: providing a new C# implementation for each of the hundreds of primitives in the Scheme runtime. I only bothered to implement those primitives called as part of the cold lood boot sequence, but still there were a lot of them. For many of these primitives, the C# implementation just achieved the same effect “in spirit” as the MIT-CScheme implementation. These were easy to implement. But some were more persnickety where it was vital that the C# implementation produced exactly the same bits as the MIT-CScheme implementation. For instance, the code used to hash the types for generic method dispatch had to produce the exact same hash values in both implementations. This is because there is code that depends on the hashed multimethod ending up at a precomputed location in a method cache.
  • The C# interpreter was complete enough to boot a Scheme cold load and run it to the top-level prompt. It could run the top-level REPL. But much was missing. It could not host the SF program, which generates the S-code for the Scheme runtime. You’d have to run an original instance of MIT-CScheme to generate the S-code that you would then run in the C# interpreter.

I think the next Lisp system I will try should be based around a simple, portable JIT compiler.

Wednesday, January 1, 2025

Calling Conventions in the Interpreter

C# is not tail recursive. It could be. The IL that it compiles to supports tail recursion, but the C# compiler doesn’t generate the tail call instruction. It would be a simple thing to add: when the compiler emits a call instruction, it could check if the next instruction is a return, and if so, emit a tail call instruction. This could be controlled by a compiler flag so only us weirdos who want this feature would enable it.

But until the C# compiler adds this feature, we have to resort to other methods. I chose to use a trampoline at each call site. This is a small segment of code that awaits the result of the function call. If the callee wants to tail call, it returns the tail call target to the caller, which performs the call on the callee’s behalf. This requires a modification to the calling conventions.

EvalStep is the virtual method that all S-code objects implement to perform an evaluation. Its signature is this:

abstract class Control : SchemeObject
{
 public abstract TailRecursionFlag EvalStep (out object answer, 
 ref Control expression, 
 ref Environment environment);
}

The result of the evaluation is returned in the answer parameter. This is an out parameter, so the answer is allocated in the caller and a pointer to it is passed to the callee. The callee returns the answer by modifying it in the callers stack frame.

The expression and environment parameters are the expected parameters for a call to Eval. They, too, are allocated in the caller’s frame and references to them are passed to the callee. The callee is allowed to modify the caller’s values of these variables.

The returned value is a TailRecursionFlag. This is either 1, indicating that a value has been returned in the answer, or 0, indicating that the caller should perform another EvalStep. To return a value, the callee modifies the answer. To perform a tail call, the callee modifies the expression and environment references and returns 0.

Any caller must call EvalStep as follows: The caller allocates an answer variable to receive the answer of the call. It also allocates an expression, and environment variable to pass to the callee. It then calls EvalStep in a loop until the callee returns a TailRecursionFlag of 1, indicating that the answer has been set to the return value.

In the EvalStep for an S-code Conditional we see an example of the calling convention:

 object ev;
 Control unev = predicate;
 Environment env = environment;
 while (unev.EvalStep (out ev, ref unev, ref env) == TailRecursionFlag.TailCall) { };

We are making a recursive call to evaluate the predicate. We set up ev to receive the result of the evaluation. We set up unev and env to hold the expression and environment to pass to EvalStep. unev.EvalStep does the eval dispatch via virtual function dispatch.

If the predicate returns a TailRecursionFlag of ReturnValue, the loop will exit. The predicate is assumed to have put the return value in the ev variable.

If the predicate wants to tail call, it will modify the values of unev and env to the new expression and new environment, and return a TailRecursionFlag of TailCall. The loop will iterate, using the new value of unev and env to again dispatch a call to EvalStep.

When the while loop exits, the ev variable will contain the return value of the predicate. Control may be returned to the while loop several times before the loop exits. This is the trampoline in action.

Conditional expressions don’t return a value. They either tail call the consequent or the alternative. The EvalStep for a conditional ends like this:

 answer = null;
 expression = (ev is bool evb && evb == false) ? alternative :
 return TailRecursionFlag.TailCall;
}

The answer variable in the caller is set to null. out parameters must always be assigned to before the function exits, so this just keeps the compiler happy. If the return value of calling EvalStep on the predicate is the boolean false, we set the expression in the caller to the alternative, otherwise the consequent. This is the target of our tail call to EvalStep. For the scode for a conditional, we leave the environment alone — the tail call uses the same environment unchanged. We finally return TailRecursionFlag.TailCall so that the caller’s trampoline makes another iteration around its while. It will call EvalStep on the alternative or consequent that we stuffed in the caller’s expression.

This little song and dance is performed at every recursive call to EvalStep making EvalStep behave as a tail-recursive function. This calling convention is about half the speed of a normal C# method call. It is the cost of using a trampoline for tail recursion.

First Class Continuations

There is one more obscure reason that the control might return to us when evaluating the predicate. If some function further down the call chain invokes call-with-current-continuation, we need to copy the stack. The callee indicates this by returning a magic return value of Special.UnwindStack. The callee sets the caller’s environment to an UnwinderState that will accumulate the stack frames as we unwind the stack. So our calling convention says we need to check the return value of EvalStep, and if it is Special.UnwindStack, we allocate a ConditionalFrame on the heap that will contain the state of the current stack frame. We AddFrame to the UnwinderState. We propagate this up the stack by putting it in the caller’s environment, setting the caller’s value of answer to Special.UnwindStack and returning TailRecursionFlag.ReturnValue to stop the caller’s trampoline loop.

The full code of EvalStep for an S-code if expression is this:

 public override TailRecursionFlag EvalStep (out object answer, 
 ref Control expression,
 ref Environment environment)
{
 object ev;
 Control unev = predicate;
 Environment env = environment;
 // Tail recursion trampoline.
 while (unev.EvalStep (out ev, ref unev, ref env) == TailRecursionFlag.TailCall) { };
 // Support for first class continuations.
 if (ev == Special.UnwindStack)
 {
 ((UnwinderState) env).AddFrame (new ConditionalFrame (this, environment));
 environment = env;
 answer = Special.UnwindStack;
 return TailRecursionFlag.ReturnValue;
 }
 // Tail call EvalStep on the consequent or alternative.
 answer = null;
 expression = (ev is bool evb && evb == false) ? alternative : consequent;
 return TailRecursionFlag.TailCall;
}

First class continuations allow you unload and reload the pending call chain. We see that at each call site, we must check the return value and, if it is Special.UnwindStack, we create a new Frame on the heap and add it to the unwinder state befor we propagate the Special.UnwindStack up the call chain.

At the very top of the call chain, we have the outermost call to EvalStep. If the Special.UnwindStack value is returned to this call, the stack has been unwound and the UnwinderState is sitting in the environment variable. We need to rewind the stack and put the stack frames back on the stack. We create a RewindState from the UnwinderState. Each time we PopFrame from the RewindState, we get a deeper frame. We reload the stack by getting the outermost frame from the RewindState and calling EvalStep on it. The EvalStep for a Frame sets up the trampoline loop, calls PopFrame to get the next frame, and calls EvalStep on it. When we run out of stack frames to reload, the stack is reloaded and we return control the innermost frame so it can continue where it left off. This is the rewind loop.

The EvalStep for a Frame, after making the recursive call to EvalStep on the next frame, continues with code that is a duplicate of the code in the original frame before the cotinuation was captured. A specific example will make this clear. If an if expression is on the stack when it is uwound, a ConditionalFrame is created. A ConditionalFrame is a subclass of SubproblemFrame which has this EvalStep method:

public override TailRecursionFlag EvalStep (out object answer,
 ref Control expression,
 ref Environment environment)
{
 object temp;
 Control expr = ((RewindState) environment).PopFrame ();
 Environment env = environment;
 while (expr.EvalStep (out temp, ref expr, ref env) == TailRecursionFlag.TailCall) { };
 if (temp == Special.UnwindStack)
 {
 ((UnwinderState) env).AppendContinuationFrames (continuation);
 environment = env;
 answer = Special.UnwindStack;
 return TailRecursionFlag.ReturnValue;
 }
 expression = this.expression;
 environment = this.environment;
 return Continue (out answer, ref expression, ref environment, temp);
}
public abstract TailRecursionFlag Continue (out object answer,
 ref Control expression,
 ref Environment environment,
 object value);

That is, the EvalStep of the SubproblemFrame establishes a trampoline, pops the next frame from the RewindState, and invokes its EvalStep method. When an answer is returned, the SubproblemFrame calls its Continue method.

The Continue method is a virtual method that is implemented by each subclass of SubproblemFrame. It finishes the work of the frame. In the case of a ConditionalFrame, the Continue method is this:

public override TailRecursionFlag Continue (out object answer,
 ref Control expression,
 ref Environment environment,
 object value)
{
 answer = null;
 expression = value is bool bvalue && bvalue == false
 ? SCode.EnsureSCode (this.expression.Alternative)
 : SCode.EnsureSCode (this.expression.Consequent);
 return TailRecursionFlag.TailCall;
}
compare this to the code in the original Conditional:
 // Tail call EvalStep on the consequent or alternative.
 answer = null;
 expression = (ev is bool evb && evb == false) ? alternative : consequent;
 return TailRecursionFlag.TailCall;

There are only superficial differences: the Continue method gets the value returned by the predicate in an argument rather than in a local variable. It type checks the alternative and consequent components of the if expression by calling SCode.EnsureSCode. Otherwise, the code does the same thing.

It is not possible to actually rewind the stack with the original set of pending methods. What we do instead is rewind the stack with methods that do the same thing as the original pending methods. It is close enough. The same values will be computed.

There is one place you can see the difference. If you look at the stack trace in the debugger before you capture a continuation, you will see the pending recursive calls to the S-code EvalStep methods. If you look at the stack trace in the debugger after you capture a continuation, you will instead see pending calls to the EvalStep methods of a set of frames. The pending frames are in the same order and have names similar to the original pending methods. They compute the same values, too. But the debugger can notice that these are not the originals.

More Inlining

Calls to (null? x) usually appear as the predicate to a conditional. We can specialize the conditional. Instead of

[if
 [primitive-null?-argument0]
 [quote 69]
 [quote 420]]

We create a new S-code construct, if-null?-argument0, and construct the conditional as

[if-null?-argument0 
 [quote 69]
 [quote 420]]

We avoid a recursive call and generating a ’T or ’NIL value and testing it, we just test for null and jump to the appropriate branch, just like the compiled code would have done.

Multiple arguments

We can further specialize the conditional based on the types of the consequent and alternative. In this case, they are both quoted values, so we can specialize the conditional to [if-null?-argument0-q-q 69 420]. (Where the name of the S-code type is derived from the type of the consequent and alternative.)

if-null?-argument0-q-q is an esoteric S-code type that codes a test of the first argument for null, and if it is null, returns the first quoted value, otherwise the second quoted value. This S-code type runs just as fast as compiled code. Indeed the machine instructions for evaluating this S-code are the same as what the compiler would have generated for the original Lisp form.

But there is a problem

Why not continue in this vein specializing up the S-code tree? Wouldn’t our interpreter be as fast as compiled code? Well it would, but there is a problem. Every time we add a new S-code type, we add new opportunities for specialization to the containing nodes. The number of ways to specialize a node is the product of the number of ways to specialize its children, so the number of ways to specialize the S-code tree grows exponentially with the number of S-code types. The few specializations I’ve just mentioned end up producing hundreds of specialized S-code types. Many of these specialized S-code types are quite esoteric and apply at most to only one or two nodes in the S-code tree for the entire program and runtime system. Performing another round of inlining and specialization would produce thousands of specialized S-code types — too many to author by hand, and most of which would be too esoteric to ever be actually used.

The solution, of course, is to automate the specialization process. We only generate a specialized S-code type when it is actually used by a program. The number of specialized S-code types will be limited by the number of ways programs are written, which is linear in the size of the program.

But specializing the code when we first encounter it is just JIT compiling the code. We’ve just reinvented the compiler. We might as well skip the multi-level specialization of the S-code tree and write a simple JIT compiler.

Inlinig Primitive Function Calls and Argument Evaluation

Inlining some primitives

Reconsider our model of a Lisp program as a “black box” that issues a series primitive function calls. We can eliminate some of the primitive function calls by implementing them directly within our “black box”. We inline some primitives.

Take null? as an example. Instead of constructing (null? arg) as

[primitive-funcall1
 [quote [primitive null?]]
 [argument 0]]

we add a new S-code construct, primitive-null?, and construct (null? arg) as

[primitive-null?
 [argument 0]]

We don't have to evaluate the function. We don't even have to jump to the entry point of the null? primitive. After we evaluate argument 0, we just test for null in the interpreter and return T or NIL as appropriate.

There are maybe 20 or so primitives that are used frequently enough to be worth inlining in this way. Each primitive you inline like this adds bloat to the interpreter.

Inlining simple arguments

The leaves of a tree of S-code are the atomic expressions, whereas the internal nodes of the tree are compound expressions. We can eliminate the leaves by inlining them into their parent nodes. For example if a leaf node is a lexical variable reference, we inline this into the parent node. We unroll the evaluation of the leaf node thus saving a recursive call to the interpreter and an evaluation dispatch.

Consider our previous example which we consructed as

[primitive-null?
 [argument 0]]

We further specialize primitive-null? based on its argument type into primitive-null?-argument or primitive-null?-lexical. Now our S-code becomes:

[primitive-null?-argument 0]

The leaf node [argument 0] is absorbed into the parent node [primitive-null? ...] making a new leaf node [primitive-null?-argument]. The evaluator for this S-code node simply tests if argument 0 is null and returns T or NIL as appropriate.

Compare this to the original S-code:

[funcall
 [global 'null?]
 [argument 0]]

This required two recursive calls to the interpreter, a global lookup, and a primitive function call on top of the null test. We’ve eliminated all of those. There’s not much left to do. Testing null? in the interpreter is almost as fast as testing null? in compiled code.

The number of S-code types needed to perform this inlining is the number of kinds of leaf nodes raised to the power of the number of leaves in the parent node. A call to a one-argument primitive would need specializations for the cases where the argument is a quoted value, an argument, a lexical variable or a global variable — four specializations. Calls to a two-argument primitive turn into one of sixteen specializations — the product of four for each argument. A call to a three-argument primitive would turn into one of sixty-four specializations.

We can inline all the non-compound argument evaluations, both to primitive functions and to user-defined functions. In our S-code tree, we have removed all the leaf nodes and absorbed them into their parent nodes (which have now become new leaves). The interpreter is now quite a bit faster, although still not as fast as compiled code.

Tuesday, December 31, 2024

Usual Integrations, Function Calls and Primitive Calls

Near the beginning of most MIT-Scheme files you'll see (declare (usual-integrations)). This instructs the SF program, which translates Scheme to S-code, to replace the “usual” global variables with their (compile time) values. You lose the ability to change the values of these variables, but who redefines CAR or CDR? (And if you do, just remove the declare form or add CAR and CDR to the exception list.)

Now forms such as (null? x), which used to be syntaxed into this S-code:

[funcall
 [global ’null?]
 [argument 0]]

are now syntaxed into this S-code:

[funcall
 [quote [primitive null?]]
 [argument 0]]

Recall that S-code is a tree representation of Lisp code that is walked by the interpreter. The leaves of the tree are all either quote, global, argument, or lexical, and each of these only take one or two machine instructions to eval. The “overhead” of interpretation involves getting to the leaves.

The internal nodes of an S-code tree are the compound forms: function calls and special forms. IF and PROGN (in Scheme BEGIN) are the two most frequently evaluated special forms. When we construct the S-code for a compound form, we pattern match against the subforms and specialize the S-code. Specialized S-code for a compound form is called a “superoperator”. Superoperators eliminate the eval dispatch for the subforms. The machine code for interpreting a superoperator is close to the equivalent compiled code. Let us examine some of the most important superoperators.

Funcall Argument List Elimination

In general, a function call accumulates a list of arguments and jumps to APPLY to dispatch the function. We can eliminate the overhead of doing this. First, we can eliminate the call to APPLY by having applicable objects implement the applicable interface. We use virtual function dispatch to invoke the code that implements the applicable object. Applicable objects can be applied to any number of arguments passed as a list. We can eliminate the overhead of manipulating short argument lists by spreading the arguments and adding call0, call1, call2, and call3 methods to the applicable interface. The number of arguments at any particular call site is fixed, and it is usually a small number. We eliminate the overhead of accumulating the argument list for a function call by specializing the funcall S-code object on the number of arguments.

An S-code funcall is replaced by a funcall0, funcall1, funcall2, funcall3, or funcalln depending on the number of arguments. The argument accumulation loop is unrolled in the numbered cases placing the arguments in local C# variables. The funcallx superoperators directly call the appropriate calln method in the function’s applicable interface.

Primitive Function Call

Function calls to primitive procedures are the foundation of Lisp and Scheme programs. We can look at a Lisp program as if it were a “black box” that makes a series of primitive function calls. If unoptimized, a Lisp program ought to make the same series of primitive function calls with the same values whether it is interpreted or compiled. The primitive function will run at the same speed no matter how it is called, so if compiled code is faster than interpreted, it is issuing the primitive function calls faster, taking less time between them. The time between issuing primitive function calls is the interpretation overhead, and we want to reduce that.

When we construct an S-code funcall (and funcall0, funcall1, etc.), we check if the thing being called is a literal quoted primitive function, and if it is, we replace the S-code with a primitive-funcall (or primitive-funcall0, primitive-funcall1, etc.). A primitive-funcall evaluates its arguments, but it can skip evaluating the function. It can skip the apply dispatch, too, and just jump right to the primitive entry point.

We construct (null? arg) as

[primitive-funcall1
 [quote [primitive null?]]
 [argument 0]]

The MIT-Scheme interpreter and the MIT-Scheme hardware all special case function calls with small numbers of arguments and those which call primitive procedures.

In the next post, we develop this idea beyond what MIT-Scheme already does.

Eliminating Deep Search

In general, when you look up a lexical variable, you have to do a deep search. You start with the innermost scope and work your way out until you find the variable you are looking for. This takes an absurd amount of time for each variable lookup, so we want to avoid actually doing it.

As mentioned in the previous post, we special case the two most common cases: when the variable is in the innermost scope and when the variable is in the global scope. In the other cases, when the variable is in an intermediate scope, we can flatten the scopes by copying the variable values from inner scopes to outer scopes whenever the variable is closed over. This only works because we have put the mutable variables in cells and we copy pointers to the cells.

When an S-code lambda object is created, we walk the code and find the variables that are closed over. We create a closure with a copy of the variable’s value (or a copy of its cell). In essence, we replace:

(lambda (x) (+ x y))

(assuming y is a lexical variable) with

(%make-closure ’(lambda (x) (+ x y)) y)

Once we have done this, we no longer need to deep search for lexical variables. There is only one lexical frame, it contains a copy of the variable, and we know the offset of the variable in the frame. Lexical variable lookup is now just a vector-ref.

The drawback is that we have cell-converted the mutable variables, so calling a function that contains mutable variables will allocate memory. Use of SET! makes your code expensive. Furthermore, %make-closure is linear in the number of things being closed over. But typically, closures only close over one or two items.

So between these specializations (this post and the two previous ones), we have removed the overhead of variable lookup. An expression like (lambda (x) (+ y x)) (where y is a lexical variable) is converted into S-code that looks like:

[%make-closure
 [quote (lambda ’(x)
 [funcall
 [global ’+]
 [lexical 0]
 [argument 0]])]
 [argument n] ;; wherever ’y is in the outer scope
 ]

Evaluating variables is the bulk of the work in a Lisp program. Now both the interpreter and compiler can evaluate a variable with one or two machine instructions. Interpreted code is still slower than compiled code, but this isn’t a bottleneck anymore.

Now we look for other bottlenecks. Watch this space.

Monday, December 30, 2024

Specialized Variables

In general, when you lookup a variable, you have to do a deep search through the lexical contours. This is because someone could have captured a first class environment and evaled a define in it that shadows a variable.

However, this is a very rare case. In the vast majority of cases, the variable is either a global variable or lexical variable bound in the immediately enclosing environment. We can optimize for these cases.

Symbols refer to three different kinds of variable: if the variable is free, it refers to a global or “top-level” variable, if it is bound, it may be bound in the immediately enclosing environment (i.e., it is an argument), or it may be bound in a environment further up the lexical chain. We can determine which of these cases applies statically when we construct the S-code and select the appropriate subtype of S-code variable.

A global or top-level variable can contain a pointer to the value cell of the variable in the global environment, so variable lookup simply involves a dereference. If the variable is lexically bound, and the environment is a StatcEnvironment (or subtype), then the lexical address of the variable is known statically and cannot change (the variable cannot be shadowed), so we can store the lexical address in the S-code variable. Lookup is faster because it involves constant offsets.

The variable corresponding to argument 0 of the current lambda is by far the most commonly accessed variable, followed by argument 1. We can create special S-code types for each of these two variables in j addition to the three types for global, lexical, and the other arguments. By classifying the variables into one of these five types at syntax time, we can greatly reduce the amount of work needed at runtime to access the variable.

So consider the form (lambda (n) (lambda (x) (+ x n))). n is a lexical variable bound one level deep. x is an argument and + is a global. The S-code representation of this form will have three different variable types. The S-code for n will be a LexicalVariable with a depth of 1 and offset of 0. The eval method for LexicalVariable can walk back one level in the lexical chain and then access the zeroth element of the environment.

On the other hand, the S-code for x will be an ArgumentVariable with an argument number of 0. The eval method for ArgumentVariable can call the GetArg0 method of the environment.

Finally, the S-code for + will be a GlobalVariable with a pointer to the global value cell for +. The eval method for this type of S-code variable can simply dereference the pointer.

Between specializing environments and S-code varibles, we can greatly reduce the amount of time needed to access variables.

Specialized Environments

A Lisp program conceptually does a lookup every time it evaluates a symbol. Since evaluating symbols is the single most common operation, you want it to be as fast as possible. In compiled code, symbol values might be stored in registers, on the stack, or in a vector, but in interpreted code they are usually stored in a structure called an environment. In compiled code, the symbol value can often be retrieved by a move instruction, but in interpreted code there may be a function call and a vector-ref needed to get the value. This is a significant performance hit.

We can customize the layout of an environment to speed up symbol lookup. In the general case, an environment is a fairly heavyweight data structure. Because you can eval a define form at any time, you need an environment that supports incremental definition and you need to deep search each time you do a lookup because someone might have defined a shadowing variable.

But 99% of the time, you don’t need the general case. If no one captures a first-class environment, then you don’t need to support incremental definitions because no one can call eval to define a new variable. If no one mutates a variable, then you can store a copy of the variable’s value directly in the environment instead of indirectly through a ValueCell object. Most environments only close over a couple of variables, so you can make the variables fields of the environment object instead of storing them in a vector of values. You statically know the number of variables, so you can store them in a fixed-size class instance.

An Environment is an abstract base class.

A GlobalEnvironment is a singleton class that represents bindings in a hash table.

A StandardEnvironment is a concrete subclass of Environment that supports assignment and incremental definitions. The bindings are kept in a vector of ValueCell objects and the incremental bindings are kept in a hash table. This is a general purpose environment.

class StandardEnvironment : public Environment {
 Vector<ValueCell*> bindings;
 HashTable<ValueCell*> incrementalBindings;
};

A variable lookup in a StandardEnvironment involves fetching the vector of value bindings from the environment, looking up the variable in the vector of bindings, and if it is not found, looking it up in the hash table of incremental bindings, and finally fetching the value from the cell. Several levels of indirection are involved.

A StaticEnvironment supports assignment, but not incremental definitions. Bindings are kept in a vector of ValueCell objects.

class StaticEnvironment : public Environment {
 Vector<ValueCell*> bindings;
 Object* GetArg0() { return bindings[0].Value; }
};

Looking up a value in a StaticEnvironment is slightly quicker because there is no table of incremental bindings to search.

An ImmutableEnvironment holds bindings in a vector and does not support assignment or incremental definitions. Binding values are kept directly instead of indirectly through ValueCell objects.

class ImmutableEnvironment : public Environment {
 Vector<Object*> bindings;
 Object* GetArg0() { return bindings[0]; }
};

Looking up a variable in an ImmutableEnvironment is quicker because there are no ValueCell objects to dereference.

A SimpleEnvironment is an abstract subclass of ImmutableEnvironment that does not support assignment or incremental definition. Bindings are kept in class fields.

A SimpleEnvironment0 is a concrete subclass of SimpleEnvironment that holds no bindings — these are created when you invoke a thunk. A SimpleEnvironment1 is a concrete subclass of SimpleEnvironment that holds one binding, and so on up to SimpleEnvironment3.

 class SimpleEnvironment2 : public SimpleEnvironment {
 Object* var0;
 Object* var1;
 Object* GetArg0() { return var0; }
 };

Looking up a variable in an SimpleEnvironment is quicker because you don’t have to indirect through a vector of bindings. A method as simple as GetArg0(), which simply returns a field in the instance, can often be inlined.

Environments are created by applying closures. There are subtypes of closures that correspond to the different kinds of environments. For example, a SimpleClosure2 is a closure that creates a SimpleEnvironment2.

Closures are created by evaluating lambda expressions. There are subtypes of lambda expressions that correspond to the different kinds of closures. For example, a SimpleLambda3, when evaluated, will create a SimpleClosure3.

When S-code lambda object are created, they are analyzed to see if a first-class environment is captured, and if not, whether any of the variables are mutated. The appropriate subclass of lambda object is created based on this analysis.

Almost all environments end up being SimpleEnvironments and are quite a bit faster than the general case.

Sunday, December 29, 2024

Interpreting S-code

Lisp programs are not text but rather nested lists of symbols (atoms) and other lists. By default, a symbol represents a variable and a list represents a function call, but some lists are special forms that don’t follow the normal function call semantics.

S-code is an alternative representation for Lisp programs. Instead of nested lists, S-code is a set of nested data structures. Each S-code type corresponds to a Lisp form. There is an almost trivial isomorphism between S-code and the list representation of Lisp. S-code is a bit easier to interpret and compile than the list representation because the data structures representing the code can contain auxiliary information to aid in interpretation of the code. For example, a variable reference in S-code can contain the lexical depth and offset of the variable.

Converting from nested lists to S-code is a process called “syntaxing”. The nested lists are walked, the macros are expanded, auxiliary information is collected, and the S-code is generated. The S-code can be directly interpreted or compiled to machine code. “Unsyntaxing” is the process of converting S-code back to nested lists and it basically involves a walk of the data structure, discarding the auxiliary information, and collecting a tree of lists, thus recovering the original Lisp program (modulo macro expansion).

The MIT-Scheme hardware (Scheme 79, Scheme 81, and Scheme 86) directly executed S-code. MIT-CScheme is a C and assembly program that interprets S-code and runs compiled S-code. MzScheme and Racket also use a similar nested data structure represention of Scheme programs. I don’t think they call it “S-code”, but it is essentially the same thing. No doubt other Lisp and Scheme interpreters use this technique.

(The Lisp machine, by contrast, compiled Lisp programs to a byte-code that was interpreted by the microcoded Lisp machine hardware. I believe that MzScheme and Racket now include a JIT compiler.)

A few years back, for kicks, I wrote an S-code interpreter for Scheme. I decided to write it in C# so that I could leverage the .NET Common Language Runtime. I wouldn’t have to write a garbage collector, for example. I wrote it to interpret the S-code generated by MIT-Scheme. This way I could leverage the MIT-Scheme runtime as well. But this interpreter is not simply a port of the C implementation of MIT-Scheme. Many parts of the interpreter work completely differently from the MIT C implementation. Among them:

  • The C# call stack is used to hold the continuations. Subproblem continuations are C# continuations. Nested Scheme function calls appear as nested C# function calls, so the C# debugger can be used to debug Scheme code.
  • Tail recursion is handled by a trampoline at each function call boundary. To tail call a function, you return the function, the arguments, and a special marker to the caller’s trampoline, which will make the call on your behalf.
  • First class continuations are handled by lightweight stack inspection. In addition to a special marker for tail recursion, there is also a special marker for continuation capture. If this is returned, the caller’s trampoline will evacuate its current stack frame from the stack onto the heap and propagate the special marker up the stack.
  • Eval dispatch is handled through virtual method dispatch. Each type of S-code object has an eval method that is specialized for the type. I was curious if the virtual method dispatch was fast enough to be in the main interpreter loop.
  • The S-code interpreter is instrumented to discover the “hot” paths through the code. Certain patterns of S-code that are determined to be dynamically frequent can be replaced by S-code “superoperators” that provide specialized higher performance evaluation paths that avoid recursive evaluation steps.
  • Variable assignments are eliminated through cell conversion. All variables are ultimately immutable. Variables that were assigned to are converted to mutable cells that are allocated on the heap. This means we can use a flattened environment structure, but calls to procedures with mutable variables will cons.
  • The nested environment structure was replaced with flattened environment vectors. Shared immutable lexical variables are copied, and since mutable variables are stored in cells, their references are copied. Lexical variable lookup is constant independent of the lexical depth, but closure construction becomes linear in the number of variables being closed over.

Limitations

C# does not provide a way to dump a heap snapshot, so it isn’t possible to save a Lisp image. Instead, the interpreter is “cold loaded” each time it is started. Indeed, the main problem with the interpreter is the startup time. It spends an inordinate amount of time initializing the unicode tables.

The major version of MIT-Scheme has been incremented. The C# interpreter fails to reach the prompt when cold loading the latest version. I haven’t tracked down the problem.

The C# interpreter assumes that the S-code abstraction is completely opaque, i.e. that the Scheme code knows nothing about the binary layout of S-code objects and only manipulates them through the provided primitives. This isn’t always the case, however, so the C# intepreter sometimes has to recognize that certain idiomatic patterns of S-code are actually attempting to emulate a primitive operation. For example, system-pair-car is a primitive that extracts the first element of any system object that has two elements, i.e. objects that MIT-Scheme implements as a cons cell even though not tagged as one. But the point of using C# was to hide the representation of Scheme objects from the Scheme code, so many objects are no longer implemeted as cons cells. The interpreter notices calls to system-pair-car and instead extracts whatever field of the object that MIT-Scheme would have put in the car.

The interpreter is not complete enough to host the SF program, which creates Scheme .fasl files from Scheme source. You need MIT-Scheme to create the .fasl files, which can then be loaded into the C# interpreter.

Now what?

I got increasingly frustrated with the limitations. It was taking too long to do a full debugging cycle. Playing whack-a-mole with the bugs was getting tedious.

I wanted to understand fundamentally why interpreters are so slow. In theory, shouldn’t the interpreter be able to perform the same operations as the compiled code? What exactly is “interpreter overhead”? Shouldn’t it be able to run in the same ballpark as the compiled code? What would it take to make a high-performance interpreter?

I discovered some of the answers. A large part of the overhead comes from instruction dispatch. Compiled code has its instructions dispatched in hardware and often have dedicated hardware to read and process the instruction stream. Interpreters use software to do the dispatch. You can make the interpreter faster by combining instructions and dispatching instruction combinations. There is a combinatorical explosion of instruction combinations, however, so you only want to combine the most common instruction sequences.

I wrote a number of hand-coded instruction combinations, but it got tedious and I realized that I wanted an automatic way to generate instruction combinations for common instruction sequences. That is, I wanted a JIT compiler. I put the project on the shelf at this point because a JIT compiled S-code interpreter is another project in itself.

I decided to blog a bit about some of the more unusual features of this interpreter, so watch this space if you are interested.

Tuesday, March 26, 2024

With- vs. call-with-

In Common Lisp, there are a lot of macros that begin with the word “with-”. These typically wrap a body of code, and establish a context around the execution of the code.

In Scheme, they instead have a lot of functions that begin with the words “call-with-”. They typically take a thunk or receiver as an argument, and establish a context around a call to the thunk or receiver.

Both of these forms accomplish the same sort of thing: running some user supplied code within a context. The Scheme way accomplishes this without a macro, but “call-with-” functions are rarely used as arguments to higher order functions. Writing one as a function is slightly easier than writing one as a macro because the compiler takes care of avoiding variable capture. Writing one as a macro leaves it up to the implementor to use appropriate gensyms. Writing one as a macro avoids a closure and a function call, but so does inlining the function. The macro form is slightly more concise because it doesn’t have a lambda at every call site. The function form will likely be easier to debug because it will probably involve a frame on the stack.

There’s no need to commit to either. Just write a “with-” macro that expands into a call to an inline “call-with-” function. This should equally please and irritate everyone.

Sunday, July 24, 2022

Named Lambda and Named Let

Suppose you want to map the "add three" function over a list. You don't need to define a name for the function, you can just use a lambda expression: (lambda (n) (+ n 3)). This creates an anonymous "add three" function.

Now suppose you want to map the "factorial" function over a list. You start with (lambda (x) (if (zerop x) 1 ... , but how can you recursively call an anonymous function? We need the function to have a local name within its own body. One option is to use the Y operator:

* (map 'list (Y (lambda (fact)
 (lambda (x)
 (if (zerop x)
 1
 (* x (funcall fact (1- x)))))))
 '(3 5 7))
(6 120 5040)
but another popular option is to provide a new special form
* (map 'list (named-lambda fact (x)
 (if (zerop x)
 1
 (* x (fact (1- x)))))
 '(3 5 7))
(6 120 5040)
The name fact is bound to the lambda expression only within the body of the lambda expression. You don't need to defun a factorial procedure, you can just use a named-lambda.

A little puzzle for you: write named-lambda. My answer below.

Just as a let is syntactic sugar for a lambda application, a named-let is syntactic sugar for a named-lambda application. The name is bound to the lambda expression that performs the variable binding, so you can use that name to make a recursive call. In effect, you can re-invoke the named-let expression with a fresh set of values.

Scheme hackers will be familiar with named-let, but it isn't usual in Common Lisp. It's an easy transformation:

(named-let recur ((x '(a list))
 (y 22)) 
 (body)) =>
(funcall (named-lambda recur (x y) (body)) '(a list) 22)
named-let is the bee's knees for ad hoc iteration. The iteration variables are bound to their initial values by the named-let, and the body can initiate the next iteration by tail-calling the named-let with the updated values for the iteration variables. Since there is no constraint on where or when you iterate, or how you update the iteration variables, this allows very general iteration.

I have seen a tendency for Scheme hackers to overdo it with named lets. You don't need a named-let when map will do. It's usually better to express an iteration through a higher order function than to write yet another ad hoc loop in your code. But named-let is a useful and powerful tool when the simpler methods aren't cutting it.

Here's what I came up with for named-lambda:

(defmacro named-lambda (name (&rest arglist) &body body)
 `(labels ((,name ,arglist ,@body))
 (function ,name)))

Wednesday, February 12, 2020

A polygot program puzzle

Can you come up with an expression that evaluates to the symbol 'scheme in a Scheme system, but evaluates to the symbol 'common-lisp in a Common Lisp system?

Thursday, February 6, 2020

Dispatching

There are times when you are faced with a complex piece of control flow
try {
 if (condition())
 {
 ... block 1 ...
 }
 else 
 {
 switch (someValue())
 {
 case CASE_A:
 ... block 2 ...
 break;
 case CASE_B:
 ... block 3 ...
 break;
 default:
 ... block 4 ...
 break;
 }
 }
} catch (SomeException someException) {
 ... block 5 ...
}
and you want to abstract the control flow — all the conditionals, switches, and try/catches — away from the code that does the work — the various blocks. In fact, here I've abstracted it away by putting "... block n ..." in place of the blocks of code.

If I were writing this in Scheme or Common Lisp, I'd consider using continuation passing style. I'd write a function dispatch-on-foo that would perform the dispatch, but then invoke one of several first-class procedures passed in as arguments
(defun dispatch-on-foo (foo bar case1 case2 case3 case4 case5)
 (if (... complex conditional ...) 
 (funcall case1)
 (handler-case (case some-value
 ((case-a) (funcall case2))
 ((case-b) (funcall case3))
 (t (funcall case4)))
 (error (condition) (funcall case5)))))
At the call site, I'd write
(dispatch-on-foo <arg1> <arg2>
 (lambda ()
 ... block 1 ...)
 (lambda ()
 ... block 2 ...)
 (lambda ()
 ... block 3 ...)
 (lambda ()
 ... block 4 ...)
 (lambda ()
 ... block 5 ...))
This is a win when the complexity of the dispatch is enough that you don't want to replicate it at every call site. Notice how the nested blocks of code have been pulled up to the same level and linearized. Granted, you've cluttered up the call site with lambda expressions, but as Steele pointed out, you can think of these as anonymous go tags: dispatch-on-foo in essence will end up doing a jump to one of these tags and execute the block there, skipping and ignoring the other blocks. Once you get used to thinking in this way, the lambdas disappear just like the parens do for a seasoned Lisp hacker. They just look like jump targets or case labels, and the call site looks a lot like a case expression. It is a bit more powerful than an ordinary case expression because you could arrange for dispatch-on-foo to funcall the appropriate closure on an argument (and have the lambda expression take an argument of course).

You could do something analagous with Java 8's lambdas, but on the rare occasion I've wanted to do something similar in Java 7. The problem is that Java 7 doesn't have lambda expressions. The solution is to change these anonymous lambdas into named callback methods. First we define a generic interface with our callbacks:
interface DispatchOnFooCases<T> {
 T caseOne (void);
 T caseTwo (void);
 T caseThree (void);
 ... etc. ...
 }
then we define the dispatch method:
<T> T dispatchOnFoo (FooClass foo, BarClass bar, DispatchOnFooCases dispatchOnFooCases)
{
 try {
 if (conditional())
 return dispatchOnFooCases.caseOne();
 else
 switch (someValue()) {
 case CASE_A:
 return dispatchOnFooCases.caseTwo();
 case CASE_B:
 return dispatchOnFooCases.caseThree();
 default:
 return dispatchOnFooCases.caseFour();
 }
 } catch (SomeException someException) {
 return dispatchOnFooCases.CaseFive();
 }
}
finally, at the call site, we write this:
{
 int value =
 dispatchOnFoo<int> (foo, bar,
 new DispatchOnFooCases<int> ()
 {
 @Override
 int caseOne (void)
 {
 ... block 1 ...
 return aValue;
 }
 @Override
 int caseTwo (void)
 {
 ... block 2 ...
 return aDifferentValue;
 }
 ... etc. ...
 });
}
The good news is that we've accomplished our goal of abstracting the complex conditional dispatch from the code that does the real work — the method bodies at the call site.

There is, unfortunately, a fair amount of bad news. First, if you thought lambda expressions introduced clutter, then this is a serious amount of clutter. Between @Overrides, type declarations, interfaces, and methods, there is just a lot of extra stuff you have to type. It still might be worth the clutter if the dispatch conditions are complex enough. They just need to be that much more complex to justify all this machinery. (We've actually done the work the compiler would do to allocate and pass a “multi-closure”.) There are cases where this pays off, though.

The second piece of bad news is that Java is not (in general) tail recursive. This means that the call to dispatchOnFoo and the callback to one of the cases both introduce a new stack frame. So although the case methods run in the same lexical environment as where they are defined, they are running two stack frames deeper. This won't make much of a difference unless you try to loop by recursively calling the code. In that case, you need to be very careful to limit the amount of recursion or you will overflow the stack. It is best to avoid recursion as much as possible in the bodies of the cases.

You probably won't need to resort to this doing this. It can be a case of the cure being worse than the disease. The complexity of introducing callbacks can exceed the complexity of the conditional you are trying to abstract. But this is an interesting way to abstract a very complex conditional and can come in handy when you can justify using it. I have actually used this technique in production code to separate some complex control flow from the code that did the actual work.

Tuesday, January 21, 2020

But what if you really want to push a stack frame?

If you really don't want tail recursion, the solution is simple: don't put the call in “tail position”. We define a rather trivial function dont-tail-call and use it like this:
(dont-tail-call
 (foo x))
The semantics of Scheme are that the arguments are evaluated before the call to the function, so the call to foo is required to occur before the call to dont-tail-call which necessitates allocation of a continuation.

But what about a really clever compiler that somehow “optimizes” away the call to dont-tail-call? Such an “optimization” is highly questionable because it changes the observable semantics of the program so it is no longer call-by-value. A compiler that performed that transformation wouldn't be a Scheme compiler because Scheme is a call-by-value language.

But what if we had really clever compiler and we disregard the change in semantics? We can easily thwart the compiler by deferring the definition of dont-tail-call until run time. Even the cleverest compiler cannot see into the future.

The definition of dont-tail-call is left to the reader, as is how to defer it's definition until run time.

Afraid of Tail Recursion

It's well known fact among proponents of tail recursion that some people just don't get it. They view tail recursion at best as a quirky compiler optimization that turns some recursive calls into loops. At worst, they see it as some sort of voodoo, or a debugging pessimization. They see little value in it. Some have outright disdain for it.

Tail recursion isn't just about turning recursive calls into loops. It's about changing how you look at function calling. Tail recursion just happens to fall out of this new viewpoint.

Most programmers, I think, view function calls as if they were akin to a short vacation. You pack up the arguments in your luggage, travel to the destination, unpack your luggage, do some stuff, repack your luggage with some souvenirs, return home, unpack everything and resume life where you left off. Your souvenirs are the return value.

Should you need a vacation from your vacation, you do the same thing: pack up the arguments in your luggage, travel to your new destination, unpack your luggage, do some stuff, repack your luggage with some souvenirs, return to your original vacation spot and resume your original vacation.

Tail recursion aficionados realize that the journey itself is the important part of the function call, and that a vacation includes two journeys. On the first journey you pack up the arguments, including the return ticket, in your luggage, use the outbound ticket to journey to the destination, unpack your luggage, and start doing stuff. When you run out of stuff to do, you make the second journey. You fetch the return ticket, repack your luggage, take the ticket to wherever it leads (presumably back home), unpack everything, and resume doing whatever you were doing there.

But perhaps you want to visit grandma instead of going directly home. Then we change the script slightly. When you run out of things to do on your vacation, you pack up your luggage with your souvenirs and the return ticket, then you journey to grandma's house, where you unpack and start doing stuff. Eventually you are done visiting grandma, so then you fetch the return ticket, repack your luggage, take the ticket to wherever it leads, unpack everything, and resume doing stuff there. It's a three-legged journey. You don't go from grandma's back to the vacation resort — there's nothing left for you to do there. You take the return ticket directly home.

Viewing things this way, a function call involves packaging the arguments in a suitable way, deallocating any temporary storage, and then making an unconditional transfer to the function, where we unpack the arguments and resume execution of the program. It is simply “a goto that passes arguments”.*

A function return is simply “a goto that passes a return value”. It involves packaging the return value in a suitable way, deallocating any temporary storage, and then making an unconditional transfer to the return address, where we resume execution of the program.

A tail recursive function call is simply “a goto that passes arguments”. It involves packaging the arguments in a suitable way, deallocating any temporary storage and then making an unconditional transfer to the function, where we resume execution of the program.

Do we really deallocate temporary storage before every control transfer? Certainly a return pops the topmost stack frame, and as often implemented, a tail recursive function call deallocates its stack frame or replaces it before transferring control, but a non tail recursive call? It does so as well, it's just that it also has to pack those values into a new continuation for the return trip. We use an implementation trick to avoid the absurdity of actually moving these values around: we move the base of frame pointer instead. Voila, we simultaneously deallocate the stack frame and allocate the continuation with the right values already in place.

Deallocating storage before each control transfer is an important part of the protocol. We're making a unconditional transfer to a destination with the hope, but no guarantee, that we'll come back, so we'd better tidy up before we leave. This ensures that we won't leave a trail of garbage behind us on each transfer which would accumulate and adversely affect the space complexity of our program.

Once you view a function call and return as not being a single sequence, but each one a separate, and virtually identical sequence, then tail recursion becomes a natural consequence. Tail recursion isn't a special case of function call, it is the same thing as a function call, the only difference being whether a new continuation (the "return ticket") is allocated in order to come back. Even function returns are the same thing, the only difference being that destination is (usually) determined dynamically rather than statically. Tail recursion isn't just another optimization, it's the result of treating inter-procedural control transfer symmetrically.

Another natural consequence is greatly increased options for control flow. Instead of a strict call/return pattern, you can make "three-legged" trips, or however many legs you need. You can make loops that incorporate one, two, or even a dynamically changing number of functions. You can replace compiler-generated returns with user-provided function calls (continuation-passing style) and implement arbitrarily complex control and data flow like multiple return values, exception handling, backtracking, coroutines, and patterns that don't even have names yet. And of course you can mix and match these patterns with the standard call and return pattern as well.

The phrase "tail recursion" is shorthand for this symmetric view of interprocedural control flow and is meant to encompass all these consequences and control flow options that spring from it. It's not about simply turning recursive functions into loops.

People who are afraid of tail recursion seem unable to see any value in taking up the symmetric viewpoint despite the fact that it opens up a whole new set of control flow techniques (in particular continuation-passing style). They find the notion that a procedure call is “a goto that passes arguments” “nonsensical”. A lot of good research has come from taking this nonsense seriously.


*The view that a function call is simply a “a goto that passes arguments” was developed by Steele in his “Lambda papers”.

The important point of cleaning up before the control transfer was formalized by Clinger in “Proper Tail Recursion and Space Efficiency”.

Someone — it might have been Clinger, but I cannot find a reference — called tail recursion “garbage collection for the stack”. The stack, being so much more limited in size than the heap, needs it that much more. Indeed Clinger notes the tight connection between tail recursion and heap garbage collection and points out that heap garbage collection is hampered if the stack is retaining pointers to logically dead data structures. If the dead structures are large enough, garbage collection can be rendered useless. Yet many popular languages provide garbage collection but not tail recursion.

The only difference between a call and return is that typically the call is to a statically known location and the return address is dynamically passed as a "hidden" argument. But some compilers, like Siskind's Stalin compiler, statically determine the return address as well.

The only difference between a function call and a tail recursive function call is when you need to return to the caller to complete some work. In this case, the caller needs to allocate a new continuation so that control is eventually returned. If there is no further work to be done in the caller, it doesn't create a new continuation, but simply passes along the one that was passed to it.

Many compilers have been written that handle function calls, tail recursive function calls, and returns identically. They only change what code they emit for handling the continuation allocation. These compilers naturally produce tail recursive code.

Most machines provide special purpose support for a LIFO stack. It is tempting to use the stack for allocation of continuations because they are almost always allocated and deallocated in LIFO order, and a stack gives higher performance when this is the case. Many compilers do in fact use the stack for continuations and argument passing. Some, like Winklemann's Chicken compiler follow Baker's suggestion and treat the stack as an "nursery" for the heap. Others avoid using the stack altogether because of the extra complexity it entails. And others cannot use the stack because of constraints placed on stack usage by OS conventions or the underlying virtual machine model.
Subscribe to: Comments (Atom)

AltStyle によって変換されたページ (->オリジナル) /