-
Part 1: The Spark — Exposing the "Safe-Room" security leak and building the compiler gate.
-
Part 2: The NDA Language — Designing a content-addressed triplet representation to cure context bloat.
-
Part 3: Ditching the Web Stack — Building a native 30MB IDE with 1,500,000x IPC latency drops.
-
Part 4: The Closure JIT — Compiling AST blocks to nested closures and bypassing borrow checker limits. (You are here)
-
Part 5: JIT Math Optimizations — Replacing division operations with precomputed 16-bit lookup tables.
-
Part 6: x86-64 Assembler & SCEV-Lite — Compiling scalar loops directly to native code in constant time.
-
Part 7: Classic Compiler Passes — Implementing inter-procedural Dead Code Elimination and loop unrolling.
-
Part 8: Reclaiming Ring 0 — Exiting UEFI boot services and transitioning the kernel to Ring 0.
-
Part 9: Bare-Metal Drivers — Writing a PCI scanner, NVMe block storage controller, and FAT32 parser.
-
Part 10: Synaptic Canvas — Rendering a spatial, force-directed GUI based on model token activation vectors.
-
Part 11: Swarms & Hot-Patching — Building multi-agent scheduling and zero-downtime RCU driver updates.
-
Part 12: Self-Evolution — Handing system control over to a local LLM Terminal that self-optimizes via telemetry.
Tier-1: The Closure JIT
I started by designing a Tier-1 Closure-Based JIT Compiler.
Instead of compiling directly to machine instructions, the compiler walks the AST at load-time and generates a chain of nested Rust closures (Box<dyn Fn>).
This approach resolves all opcode matches, scope checks, and control-flow branches at compile-time. At runtime, the JIT engine simply walks down a flat, pre-compiled chain of function pointers. This completely eliminates branch misprediction penalties and instruction cache misses.
Here is how the compiler defines the JIT function type and registers the compilation sequence in src/compiler/nda_jit.rs:
// compiler/nda_jit.rs — Closure JIT definitions
pub enum JitControlFlow {
Continue,
Break,
Return,
}
// A compiled JIT closure: accepts a mutable state reference of *any* lifetime 'a
pub type JitFn = Arc<dyn for<'a> Fn(&mut JitState<'a>) -> Result<JitControlFlow, String> + Send + Sync>;
// Compile a sequence of NDA AST nodes into a flat chain of closures
fn compile_sequence(nodes: &[NdaNode], counter: &mut usize, registry: &VarRegistry) -> Vec<JitFn> {
nodes.iter().map(|n| compile_node(n, counter, registry)).collect()
}
Dynamic Dispatch: How AST Nodes Compile to Closures
To understand why this compiler is so fast, we have to look at how the AST nodes compile into closures.
In a standard interpreter, executing an assignment like let a = 5 and a load like a + 1 requires querying a hash map by string name inside loop ticks. The JIT closure compiler bypasses this by pre-allocating variable slots at load-time and wrapping the runtime actions in nested closures that hold direct index offsets.
Here is the exact implementation in src/compiler/nda_jit.rs for compiling Let and Load nodes:
// compiler/nda_jit.rs — Compiling Let and Load AST nodes to closures
fn compile_node(node: &NdaNode, counter: &mut usize, registry: &VarRegistry) -> JitFn {
*counter += 1;
match node {
// Compile a variable declaration
NdaNode::Let { name_hash, init } => {
let slot = registry.get_or_create_slot(*name_hash);
let init_fn = compile_node(init, counter, registry);
Arc::new(move |state: &mut JitState<'_>| {
state.executed_nodes += 1;
// Evaluate the initialization expression
init_fn(state)?;
let val = state.stack.pop().ok_or("Stack underflow in Let init")?;
// Write directly to the pre-allocated flat array index
if slot >= state.variables.len() {
state.variables.resize(slot + 1, None);
}
state.variables[slot] = Some(val);
Ok(JitControlFlow::Continue)
})
}
// Compile a variable reference load
NdaNode::Load { name_hash } => {
let slot = registry.get_or_create_slot(*name_hash);
Arc::new(move |state: &mut JitState<'_>| {
state.executed_nodes += 1;
// Sub-nanosecond flat array read, no hash map overhead
let val = state.variables.get(slot)
.and_then(|v| v.as_ref())
.ok_or_else(|| format!("Load of uninitialized variable slot {}", slot))?;
state.stack.push(val.clone());
Ok(JitControlFlow::Continue)
})
}
// ... other nodes (Matrix, Norm, Loop, Add) compile similarly
}
}
By resolving variable lookups to slot indices during compilation and mapping them directly to pre-allocated indices in JitState::variables, we reduce variable load/store operations from hash table lookups to flat memory offsets.
The Lifetime Trap: Higher-Ranked Trait Bounds (HRTBs)
However, I immediately hit a massive Rust lifetime wall.
The JIT execution closures needed to query my persistent Merkle database (SiteMap) to resolve content-addressed function calls. Because the JIT closures were stored and executed dynamically, Satisfying Rust’s borrow checker required wrapping the SiteMap in an Arc<SiteMap>.
This meant that every variable assignment, function call, and closure jump required cloning the atomic reference count. The CPU was wasting cycles updating memory barriers in the hot path.
To fix this, I refactored the JIT engine to accept direct reference inputs &SiteMap instead. I solved the lifetime constraint by using Higher-Ranked Trait Bounds (HRTBs):
type JitFn = Arc<dyn for<'a> Fn(&mut JitState<'a>) -> Result<JitControlFlow, String> + Send + Sync>;
By specifying for<'a>, I explicitly instructed the compiler that the JIT closure could accept a JitState of any lifetime 'a. This allowed the JIT engine to reference the live, stack-allocated database directly, eliminating Arc clones and reference-counting heap writes entirely.
The JIT Sandbox
I wrapped this JIT engine in a custom JIT Sandbox (NdaJitSandbox). Before any program was committed to the codebase, the sandbox:
- Compiled the AST on the fly (taking just 93 microseconds).
- Ran the execution inside a panic-safe boundary (
AssertUnwindSafe).
- Captured print buffers and returned execution metadata.
Here is the architectural comparison mapping the JIT compilation pipeline and sandbox verification execution path:
[画像:Flowchart showing the JIT Sandbox compilation pipeline: deciding between Tier-1 Closures and Tier-2 Machine code assembly]Fig 1: The two-tier JIT sandbox compilation pipeline and execution pathways.
Pascal's Analysis: Bypassing the Serialization Wall
When I shared the performance gains (the JIT sandbox executing a 4-layer network block in 206μs including compile-and-run time),
analyzed the structural benefits:
"The format itself enforces consistency at write time, so the model can commit incrementally — each triple is either valid against the current graph or it isn't. The correction happens at write speed, not at review time."
By compiling directly to closures, I was allowing the model's output to bypass the serialization wall completely.
But my JIT closures still relied on heap allocations and standard integer loops. I needed to push compiler performance to match—and exceed—native Rust scalar math.
In the next post, I'll document how I optimized the JIT math by introducing slot-based registries and division-free byte loops.
Discussion
How do you handle runtime extensibility in compiled languages? Have you worked with closure chains or dynamic function dispatch in Rust? How do you tackle borrow checker constraints when dealing with dynamic state sharing? Let's discuss in the comments below!
Special thanks to
for showing me that a structured compilation pipeline is the ultimate guard against model hallucinations.
Disclaimer: AI was used throughout this project, it is just fitting that it would co-author with me, so special thanks to the Foundry for its tireless hours toiling away and Gemini for producing the cover image.