[DO NOT MERGE] Backport incremental snapshots to v4.28#14179

Draft

Kha wants to merge 16 commits into

releases/v4.28.0 from

backport-incr-v4.28

Draft

[DO NOT MERGE] Backport incremental snapshots to v4.28 #14179
Kha wants to merge 16 commits into
releases/v4.28.0 from
backport-incr-v4.28

Conversation

@Kha

@Kha Kha commented Jun 25, 2026

Copy link

Copy Markdown

Member

No description provided.

eric-wieser and others added 16 commits

June 25, 2026 12:08

@eric-wieser @Kha


 fix: handle interrupted reads ( #13493 )

3fe3eb4

This PR ensures that `import` gracefully processes `EINTR` errors from
the filesystem.
`read()` is allowed to return fewer bytes than requested, if it is
interrupted. If it is interrupted before returning any bytes, then it
returns `EINTR`. In either case, Lean should keep reading the file.

@eric-wieser @Kha


 fix: prevent undefined behavior without LEAN_MMAP  ( #13521 )

1da79a9

This PR prevents undefined behavior in `readModuleDataParts #[]` on
configurations without `LEAN_MMAP`. Previously this would lead to
out-of-bounds indexing.

@eric-wieser @Kha


 fix: indicate in readModuleDataParts  if memory is exhausted ( #13549 )

2c10afa

This PR makes `readModuleDataParts` report a clearer error if there is
insufficient memory to load a module.
changelog-ignores-my-messages-unless-i-edit-the-description-afterwards

@Kha


 feat: add Lean.CompactedRegion.save/read supporting cross-file ob...

ef5550e

...ject sharing (#13185)
This PR adds new incremental module serialization functions that
save/load a single module at a time with explicit sharing via dep
regions and compactor state, generalizing the existing batch
saveModuleDataParts API.
Two sharing mechanisms that can be mixed:
- `CompactedRegion` dep regions for sharing with loaded regions
- `CompactorState` for same-process chaining (pre-loaded `m_obj_table`)

@Kha @claude


 feat: support serializing closures in CompactedRegion.save  ( #13891 )

7ab49f1

This PR adds opt-in support for serializing closures (functions with
captured values) to `.olean` files via `CompactedRegion.save
(allowClosures := true)`, so a saved function can be loaded back and
called, including from a separate process. Regular module data is
unaffected and continues to reject closures.
With `allowClosures := true` the writer emits a new `v3` `.olean`
layout, `[header][data_size][data][closure offsets][lib relocation
table]`, whose sections are all length-prefixed so the reader parses
forward. The relocation table records `(base_addr, id)` for only the
loaded libraries that contain a compacted closure's function pointer; on
load each is matched by `id` against the currently-loaded libraries to
compute a per-library address delta that relocates the closures' `m_fun`
pointers under ASLR. A library that is no longer loaded is a hard error
rather than a silent mis-relocation.
`v3` files are mapped copy-on-write so function pointers are patched in
place without modifying the file. The closure-offset list points the
loader directly at each `m_fun` field, avoiding a scan of the compacted
region, and a zero closure count lets closure-less `v3` files skip the
relocation table and `get_loaded_libs()` entirely. The full
pointer-fixup walk still runs only when a dependency region fails to
load at its saved base address.
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

@Kha @claude


 refactor: make CompactedRegion  serializable ( #13984 )

cc27c92

This PR reshapes `CompactedRegion` from a bare `USize` pointer into a
structure whose `filePath`, `size`, `isMemoryMapped`, and `depPaths` are
ordinary Lean values, with the logical base address, a root-relative
buffer offset, and a pointer to the region's root object kept in private
fields. Because the metadata is now plain Lean data and the root pointer
is an ordinary pointer into the region rather than an opaque handle, a
`CompactedRegion` (and thus `Environment`) can be embedded in compacted
data: as long as the region is also supplied in `depRegions`, the object
compactor relocates the pointer like any other cross-region reference
and it is re-resolved against the freshly mapped buffer on load.
The region's buffer is no longer owned by a C++ struct. The former
`compacted_region` becomes `region_reader`, a transient non-owning
read-context that fixes up a just-mapped buffer and is then discarded;
cross-region dependencies are passed as a lightweight `region_view`
value (begin/size/base_addr) rather than a pointer. The Lean
`CompactedRegion` owns the mapping, framed by its whole file (base
address, root-to-base offset, file size), and `CompactedRegion.free`
releases it directly via `munmap`/`free`; `free` first neutralizes the
root pointer so that decrementing any surviving reference to the freed
structure does not dereference released memory. The former
`lean_compacted_region_{size,is_memory_mapped}` accessors become
structure projections, and `read` populates all fields at load time.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@Kha @claude


 feat: add --incr-*  flags for disk-based elab resumption ( #13965 )

2437b0d

This PR adds **experimental** CLI flags that cache `lean`'s post-import
elaboration state across invocations: `--incr-save FILE` writes a full
snapshot at end of run, `--incr-load FILE` reuses one at startup, and
`--incr-header-save FILE` writes a header-only snapshot (post-import
`Environment`, no command bodies). A loaded snapshot will be reused as
far as unchanged syntax (i.e. import header plus subsequent commands, if
saved) allows for.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

@Kha @claude


 ×ばつ on headers that pull in many oleans (an `import Lean` snapshot goes from ~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever the structural fixup walk would itself be skipped. `region_reader`'s constructor previously sorted `m_dep_regions` by `base_addr` so that `fix_object_ptr` can binary-search across regions. But the walk is skipped entirely on the common fast path (the region landed at its saved `base_addr` and every dep did too), and `--incr-load` chains 9000+ dep oleans through one `region_reader` per `CompactedRegion.read` call against a growing `depRegions` array, so the eager sort accumulated `O(N2 log N)` work for no benefit. The sort and overlap validation now live in `sort_and_validate_dep_regions()` and run only when the slow-path walk starts. `read()` is reordered so the fast-path check fires before any `fix_object_ptr` call, including the root pointer (the saved value already matches the load-time address on the fast path, so no fixup is needed). Co-authored-by: Claude <noreply@anthropic.com>" data-pjax="true" href="/index.cgi/contrast/https://github.com/leanprover/lean4/pull/14179/commits/5491a88b4787e080fd29d04115da73b4183b5131">perf: defer ×ばつ on headers that pull in many oleans (an `import Lean` snapshot goes from ~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever the structural fixup walk would itself be skipped. `region_reader`'s constructor previously sorted `m_dep_regions` by `base_addr` so that `fix_object_ptr` can binary-search across regions. But the walk is skipped entirely on the common fast path (the region landed at its saved `base_addr` and every dep did too), and `--incr-load` chains 9000+ dep oleans through one `region_reader` per `CompactedRegion.read` call against a growing `depRegions` array, so the eager sort accumulated `O(N2 log N)` work for no benefit. The sort and overlap validation now live in `sort_and_validate_dep_regions()` and run only when the slow-path walk starts. `read()` is reordered so the fast-path check fires before any `fix_object_ptr` call, including the root pointer (the saved value already matches the load-time address on the fast path, so no fixup is needed). Co-authored-by: Claude <noreply@anthropic.com>" data-pjax="true" href="/index.cgi/contrast/https://github.com/leanprover/lean4/pull/14179/commits/5491a88b4787e080fd29d04115da73b4183b5131">region_reader  ×ばつ on headers that pull in many oleans (an `import Lean` snapshot goes from ~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever the structural fixup walk would itself be skipped. `region_reader`'s constructor previously sorted `m_dep_regions` by `base_addr` so that `fix_object_ptr` can binary-search across regions. But the walk is skipped entirely on the common fast path (the region landed at its saved `base_addr` and every dep did too), and `--incr-load` chains 9000+ dep oleans through one `region_reader` per `CompactedRegion.read` call against a growing `depRegions` array, so the eager sort accumulated `O(N2 log N)` work for no benefit. The sort and overlap validation now live in `sort_and_validate_dep_regions()` and run only when the slow-path walk starts. `read()` is reordered so the fast-path check fires before any `fix_object_ptr` call, including the root pointer (the saved value already matches the load-time address on the fast path, so no fixup is needed). Co-authored-by: Claude <noreply@anthropic.com>" data-pjax="true" href="/index.cgi/contrast/https://github.com/leanprover/lean4/pull/14179/commits/5491a88b4787e080fd29d04115da73b4183b5131">dep-region sort to slow path ( #14016 ×ばつ on headers that pull in many oleans (an `import Lean` snapshot goes from ~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever the structural fixup walk would itself be skipped. `region_reader`'s constructor previously sorted `m_dep_regions` by `base_addr` so that `fix_object_ptr` can binary-search across regions. But the walk is skipped entirely on the common fast path (the region landed at its saved `base_addr` and every dep did too), and `--incr-load` chains 9000+ dep oleans through one `region_reader` per `CompactedRegion.read` call against a growing `depRegions` array, so the eager sort accumulated `O(N2 log N)` work for no benefit. The sort and overlap validation now live in `sort_and_validate_dep_regions()` and run only when the slow-path walk starts. `read()` is reordered so the fast-path check fires before any `fix_object_ptr` call, including the root pointer (the saved value already matches the load-time address on the fast path, so no fixup is needed). Co-authored-by: Claude <noreply@anthropic.com>" data-pjax="true" href="/index.cgi/contrast/https://github.com/leanprover/lean4/pull/14179/commits/5491a88b4787e080fd29d04115da73b4183b5131">)

5491a88

This PR speeds up snapshot loading (`--incr-load`) by roughly ×ばつ on
headers that pull in many oleans (an `import Lean` snapshot goes from
~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever
the structural fixup walk would itself be skipped.
`region_reader`'s constructor previously sorted `m_dep_regions` by
`base_addr` so that `fix_object_ptr` can binary-search across regions.
But the walk is skipped entirely on the common fast path (the region
landed at its saved `base_addr` and every dep did too), and
`--incr-load` chains 9000+ dep oleans through one `region_reader` per
`CompactedRegion.read` call against a growing `depRegions` array, so the
eager sort accumulated `O(N2 log N)` work for no benefit.
The sort and overlap validation now live in
`sort_and_validate_dep_regions()` and run only when the slow-path walk
starts. `read()` is reordered so the fast-path check fires before any
`fix_object_ptr` call, including the root pointer (the saved value
already matches the load-time address on the fast path, so no fixup is
needed).
Co-authored-by: Claude <noreply@anthropic.com>

@Kha @claude


 perf: load --incr-load  snapshot deps per module ( #14023 )

4d4b936

This PR makes `--incr-load` reload a snapshot's dependency `.olean*`
regions grouped by module, so each `CompactedRegion.read` only sees the
sibling variants it can actually point into rather than the full,
growing list of all previously-loaded regions. For an `import Mathlib`
snapshot this takes warm load time from ~30 s to ~0.8 s (now faster than
a fresh `import Mathlib`).
A regular `.olean*`'s only cross-region pointers are to the prior
variants of the same module (`.olean` <- `.olean.server` <-
`.olean.private`); cross-module references go through the constant map,
not direct region pointers. The old loader passed the whole growing
`depRegions` array to every read, so each read extracted and sorted an
`O(N)` dependency-view vector, i.e. `O(N2)` across the ~37k regions of
Mathlib, dominating load time.
The `<snap>.deps` sidecar is reshaped from a flat path list into a JSON
`Array ModuleArtifacts` (reusing the `--setup` artifact type),
reconstructed from `env.header.regions` at save time by
`regionsToModuleArtifacts`. The loader reads each module's variant chain
with only its own siblings as deps, reads any `.ir` region with no deps
(as regular import does), and passes the full accumulated set only to
the single final read of the snapshot region itself.
Co-authored-by: Claude <noreply@anthropic.com>

@Kha


 chore: snapshot cleanup ( #14024 )

2aa520b

@Kha @claude


 perf: load --incr-load  snapshot dep regions in parallel ( #14027 )

904162d

This PR reads a snapshot's dependency `.olean*` regions in parallel
across modules. For a cold-cache `import Mathlib` snapshot (the typical
case, where the snapshot is loaded by a process that did not just create
it), this overlaps the otherwise-serialized root-page fault I/O of
mapping ~37k regions, taking load time from ~16 s to ~6 s.
Modules are mutually independent for region loading (a regular `.olean*`
only points into the prior variants of the same module; cross-module
references go through the constant map), so each module's regions are
read by a striped worker task. Only the final read of the snapshot
region itself, which references every dependency, runs after the join.
The worker count defaults to `min(hardwareConcurrency, 4)` and is
overridable via `LEAN_IMPORT_WORKERS`. Four workers keep warm-cache load
time unchanged while still overlapping most of the cold I/O; higher
counts speed up cold loads further (~3 s at 32) but push the warm case
into a slower mode, perhaps from `mmap` address-space lock contention.
Co-authored-by: Claude <noreply@anthropic.com>

@Kha @claude


 refactor: remove unnecessary library-search import cache environment ...

ea8820f

...extensions (#14076)
This PR removes a number of environment extensions that were used merely
to hold a global IO.Ref, making them unnecessary and preventing
compatibility with disk snapshots.
Co-authored-by: Claude <noreply@anthropic.com>

@Kha


 fix: do not max-share IO.Ref  in compactor ( #14106 )

24ac066

@Kha @claude


 perf: replace the object compactor's worklist with recursion ( #14122 )

22d827f

This PR speeds up `.olean` and snapshot serialization by compacting the
object graph with direct recursion instead of an explicit worklist,
removing the redundant two-pass child resolution that looked up every
object's children twice.
`to_offset` now compacts a not-yet-seen heap object in place and returns
its offset, so the `insert_*` methods return offsets rather than a retry
`bool`. Child offsets use `m_tmp` as a recursion-local stack window so
arbitrarily large arrays stay off the C stack, and children are visited
in ascending order to reproduce the previous allocation order, keeping
the output byte-identical. Recursion depth is bounded by the former
worklist's peak size and relies on the enlarged thread stacks.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

@Kha @claude


 fix: pin fromJson? result type in loadIncrSnapshot for the v4.28....

930ed72

...0 backport
Backporting #14023/#14027 onto `releases/v4.28.0`, the `match ... >>= fromJson? with` binding `moduleArts : Array ModuleArtifacts` in `loadIncrSnapshot` left the result type a metavariable, so elaboration failed with cascading field-notation errors on `moduleArts`. Ascribe the scrutinee as `Except String (Array ModuleArtifacts)` so inference succeeds. `ModuleArtifacts` and its fields already exist at v4.28.0; this is purely an inference fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@Kha @claude


 test: add empty expected.out files for backported compactor tests

9a54821

v4.28.0's `tests/compiler` harness requires a `<test>.lean.expected.out` file for every test, whereas master's `tests/compile` harness does not. The backported `compactor_chain`, `dep_regions`, and `dep_regions_miss` tests pass silently (they only `throw` on failure), so their expected output is empty.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>