Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[DO NOT MERGE] Backport incremental snapshots to v4.28#14179

Draft
Kha wants to merge 16 commits into
releases/v4.28.0 from
backport-incr-v4.28
Draft

[DO NOT MERGE] Backport incremental snapshots to v4.28 #14179
Kha wants to merge 16 commits into
releases/v4.28.0 from
backport-incr-v4.28

Conversation

@Kha

@Kha Kha commented Jun 25, 2026

Copy link
Copy Markdown
Member

No description provided.

eric-wieser and others added 16 commits June 25, 2026 12:08
This PR ensures that `import` gracefully processes `EINTR` errors from
the filesystem.
`read()` is allowed to return fewer bytes than requested, if it is
interrupted. If it is interrupted before returning any bytes, then it
returns `EINTR`. In either case, Lean should keep reading the file.
This PR prevents undefined behavior in `readModuleDataParts #[]` on
configurations without `LEAN_MMAP`. Previously this would lead to
out-of-bounds indexing.
This PR makes `readModuleDataParts` report a clearer error if there is
insufficient memory to load a module.
changelog-ignores-my-messages-unless-i-edit-the-description-afterwards
...ject sharing (#13185)
This PR adds new incremental module serialization functions that
save/load a single module at a time with explicit sharing via dep
regions and compactor state, generalizing the existing batch
saveModuleDataParts API.
Two sharing mechanisms that can be mixed:
- `CompactedRegion` dep regions for sharing with loaded regions
- `CompactorState` for same-process chaining (pre-loaded `m_obj_table`)
This PR adds opt-in support for serializing closures (functions with
captured values) to `.olean` files via `CompactedRegion.save
(allowClosures := true)`, so a saved function can be loaded back and
called, including from a separate process. Regular module data is
unaffected and continues to reject closures.
With `allowClosures := true` the writer emits a new `v3` `.olean`
layout, `[header][data_size][data][closure offsets][lib relocation
table]`, whose sections are all length-prefixed so the reader parses
forward. The relocation table records `(base_addr, id)` for only the
loaded libraries that contain a compacted closure's function pointer; on
load each is matched by `id` against the currently-loaded libraries to
compute a per-library address delta that relocates the closures' `m_fun`
pointers under ASLR. A library that is no longer loaded is a hard error
rather than a silent mis-relocation.
`v3` files are mapped copy-on-write so function pointers are patched in
place without modifying the file. The closure-offset list points the
loader directly at each `m_fun` field, avoiding a scan of the compacted
region, and a zero closure count lets closure-less `v3` files skip the
relocation table and `get_loaded_libs()` entirely. The full
pointer-fixup walk still runs only when a dependency region fails to
load at its saved base address.
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This PR reshapes `CompactedRegion` from a bare `USize` pointer into a
structure whose `filePath`, `size`, `isMemoryMapped`, and `depPaths` are
ordinary Lean values, with the logical base address, a root-relative
buffer offset, and a pointer to the region's root object kept in private
fields. Because the metadata is now plain Lean data and the root pointer
is an ordinary pointer into the region rather than an opaque handle, a
`CompactedRegion` (and thus `Environment`) can be embedded in compacted
data: as long as the region is also supplied in `depRegions`, the object
compactor relocates the pointer like any other cross-region reference
and it is re-resolved against the freshly mapped buffer on load.
The region's buffer is no longer owned by a C++ struct. The former
`compacted_region` becomes `region_reader`, a transient non-owning
read-context that fixes up a just-mapped buffer and is then discarded;
cross-region dependencies are passed as a lightweight `region_view`
value (begin/size/base_addr) rather than a pointer. The Lean
`CompactedRegion` owns the mapping, framed by its whole file (base
address, root-to-base offset, file size), and `CompactedRegion.free`
releases it directly via `munmap`/`free`; `free` first neutralizes the
root pointer so that decrementing any surviving reference to the freed
structure does not dereference released memory. The former
`lean_compacted_region_{size,is_memory_mapped}` accessors become
structure projections, and `read` populates all fields at load time.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This PR adds **experimental** CLI flags that cache `lean`'s post-import
elaboration state across invocations: `--incr-save FILE` writes a full
snapshot at end of run, `--incr-load FILE` reuses one at startup, and
`--incr-header-save FILE` writes a header-only snapshot (post-import
`Environment`, no command bodies). A loaded snapshot will be reused as
far as unchanged syntax (i.e. import header plus subsequent commands, if
saved) allows for.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
×ばつ on headers that pull in many oleans (an `import Lean` snapshot goes from ~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever the structural fixup walk would itself be skipped. `region_reader`'s constructor previously sorted `m_dep_regions` by `base_addr` so that `fix_object_ptr` can binary-search across regions. But the walk is skipped entirely on the common fast path (the region landed at its saved `base_addr` and every dep did too), and `--incr-load` chains 9000+ dep oleans through one `region_reader` per `CompactedRegion.read` call against a growing `depRegions` array, so the eager sort accumulated `O(N2 log N)` work for no benefit. The sort and overlap validation now live in `sort_and_validate_dep_regions()` and run only when the slow-path walk starts. `read()` is reordered so the fast-path check fires before any `fix_object_ptr` call, including the root pointer (the saved value already matches the load-time address on the fast path, so no fixup is needed). Co-authored-by: Claude <noreply@anthropic.com>" data-pjax="true" href="/index.cgi/contrast/https://github.com/leanprover/lean4/pull/14179/commits/5491a88b4787e080fd29d04115da73b4183b5131">perf: defer ×ばつ on headers that pull in many oleans (an `import Lean` snapshot goes from ~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever the structural fixup walk would itself be skipped. `region_reader`'s constructor previously sorted `m_dep_regions` by `base_addr` so that `fix_object_ptr` can binary-search across regions. But the walk is skipped entirely on the common fast path (the region landed at its saved `base_addr` and every dep did too), and `--incr-load` chains 9000+ dep oleans through one `region_reader` per `CompactedRegion.read` call against a growing `depRegions` array, so the eager sort accumulated `O(N2 log N)` work for no benefit. The sort and overlap validation now live in `sort_and_validate_dep_regions()` and run only when the slow-path walk starts. `read()` is reordered so the fast-path check fires before any `fix_object_ptr` call, including the root pointer (the saved value already matches the load-time address on the fast path, so no fixup is needed). Co-authored-by: Claude <noreply@anthropic.com>" data-pjax="true" href="/index.cgi/contrast/https://github.com/leanprover/lean4/pull/14179/commits/5491a88b4787e080fd29d04115da73b4183b5131">region_reader ×ばつ on headers that pull in many oleans (an `import Lean` snapshot goes from ~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever the structural fixup walk would itself be skipped. `region_reader`'s constructor previously sorted `m_dep_regions` by `base_addr` so that `fix_object_ptr` can binary-search across regions. But the walk is skipped entirely on the common fast path (the region landed at its saved `base_addr` and every dep did too), and `--incr-load` chains 9000+ dep oleans through one `region_reader` per `CompactedRegion.read` call against a growing `depRegions` array, so the eager sort accumulated `O(N2 log N)` work for no benefit. The sort and overlap validation now live in `sort_and_validate_dep_regions()` and run only when the slow-path walk starts. `read()` is reordered so the fast-path check fires before any `fix_object_ptr` call, including the root pointer (the saved value already matches the load-time address on the fast path, so no fixup is needed). Co-authored-by: Claude <noreply@anthropic.com>" data-pjax="true" href="/index.cgi/contrast/https://github.com/leanprover/lean4/pull/14179/commits/5491a88b4787e080fd29d04115da73b4183b5131">dep-region sort to slow path ( #14016 ×ばつ on headers that pull in many oleans (an `import Lean` snapshot goes from ~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever the structural fixup walk would itself be skipped. `region_reader`'s constructor previously sorted `m_dep_regions` by `base_addr` so that `fix_object_ptr` can binary-search across regions. But the walk is skipped entirely on the common fast path (the region landed at its saved `base_addr` and every dep did too), and `--incr-load` chains 9000+ dep oleans through one `region_reader` per `CompactedRegion.read` call against a growing `depRegions` array, so the eager sort accumulated `O(N2 log N)` work for no benefit. The sort and overlap validation now live in `sort_and_validate_dep_regions()` and run only when the slow-path walk starts. `read()` is reordered so the fast-path check fires before any `fix_object_ptr` call, including the root pointer (the saved value already matches the load-time address on the fast path, so no fixup is needed). Co-authored-by: Claude <noreply@anthropic.com>" data-pjax="true" href="/index.cgi/contrast/https://github.com/leanprover/lean4/pull/14179/commits/5491a88b4787e080fd29d04115da73b4183b5131">)
This PR speeds up snapshot loading (`--incr-load`) by roughly ×ばつ on
headers that pull in many oleans (an `import Lean` snapshot goes from
~1.5 s to ~0.3 s wall time), by skipping the dep-region sort whenever
the structural fixup walk would itself be skipped.
`region_reader`'s constructor previously sorted `m_dep_regions` by
`base_addr` so that `fix_object_ptr` can binary-search across regions.
But the walk is skipped entirely on the common fast path (the region
landed at its saved `base_addr` and every dep did too), and
`--incr-load` chains 9000+ dep oleans through one `region_reader` per
`CompactedRegion.read` call against a growing `depRegions` array, so the
eager sort accumulated `O(N2 log N)` work for no benefit.
The sort and overlap validation now live in
`sort_and_validate_dep_regions()` and run only when the slow-path walk
starts. `read()` is reordered so the fast-path check fires before any
`fix_object_ptr` call, including the root pointer (the saved value
already matches the load-time address on the fast path, so no fixup is
needed).
Co-authored-by: Claude <noreply@anthropic.com>
This PR makes `--incr-load` reload a snapshot's dependency `.olean*`
regions grouped by module, so each `CompactedRegion.read` only sees the
sibling variants it can actually point into rather than the full,
growing list of all previously-loaded regions. For an `import Mathlib`
snapshot this takes warm load time from ~30 s to ~0.8 s (now faster than
a fresh `import Mathlib`).
A regular `.olean*`'s only cross-region pointers are to the prior
variants of the same module (`.olean` <- `.olean.server` <-
`.olean.private`); cross-module references go through the constant map,
not direct region pointers. The old loader passed the whole growing
`depRegions` array to every read, so each read extracted and sorted an
`O(N)` dependency-view vector, i.e. `O(N2)` across the ~37k regions of
Mathlib, dominating load time.
The `<snap>.deps` sidecar is reshaped from a flat path list into a JSON
`Array ModuleArtifacts` (reusing the `--setup` artifact type),
reconstructed from `env.header.regions` at save time by
`regionsToModuleArtifacts`. The loader reads each module's variant chain
with only its own siblings as deps, reads any `.ir` region with no deps
(as regular import does), and passes the full accumulated set only to
the single final read of the snapshot region itself.
Co-authored-by: Claude <noreply@anthropic.com>
This PR reads a snapshot's dependency `.olean*` regions in parallel
across modules. For a cold-cache `import Mathlib` snapshot (the typical
case, where the snapshot is loaded by a process that did not just create
it), this overlaps the otherwise-serialized root-page fault I/O of
mapping ~37k regions, taking load time from ~16 s to ~6 s.
Modules are mutually independent for region loading (a regular `.olean*`
only points into the prior variants of the same module; cross-module
references go through the constant map), so each module's regions are
read by a striped worker task. Only the final read of the snapshot
region itself, which references every dependency, runs after the join.
The worker count defaults to `min(hardwareConcurrency, 4)` and is
overridable via `LEAN_IMPORT_WORKERS`. Four workers keep warm-cache load
time unchanged while still overlapping most of the cold I/O; higher
counts speed up cold loads further (~3 s at 32) but push the warm case
into a slower mode, perhaps from `mmap` address-space lock contention.
Co-authored-by: Claude <noreply@anthropic.com>
...extensions (#14076)
This PR removes a number of environment extensions that were used merely
to hold a global IO.Ref, making them unnecessary and preventing
compatibility with disk snapshots.
Co-authored-by: Claude <noreply@anthropic.com>
This PR speeds up `.olean` and snapshot serialization by compacting the
object graph with direct recursion instead of an explicit worklist,
removing the redundant two-pass child resolution that looked up every
object's children twice.
`to_offset` now compacts a not-yet-seen heap object in place and returns
its offset, so the `insert_*` methods return offsets rather than a retry
`bool`. Child offsets use `m_tmp` as a recursion-local stack window so
arbitrarily large arrays stay off the C stack, and children are visited
in ascending order to reproduce the previous allocation order, keeping
the output byte-identical. Recursion depth is bounded by the former
worklist's peak size and relies on the enlarged thread stacks.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
...0 backport
Backporting #14023/#14027 onto `releases/v4.28.0`, the `match ... >>= fromJson? with` binding `moduleArts : Array ModuleArtifacts` in `loadIncrSnapshot` left the result type a metavariable, so elaboration failed with cascading field-notation errors on `moduleArts`. Ascribe the scrutinee as `Except String (Array ModuleArtifacts)` so inference succeeds. `ModuleArtifacts` and its fields already exist at v4.28.0; this is purely an inference fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v4.28.0's `tests/compiler` harness requires a `<test>.lean.expected.out` file for every test, whereas master's `tests/compile` harness does not. The backported `compactor_chain`, `dep_regions`, and `dep_regions_miss` tests pass silently (they only `throw` on failure), so their expected output is empty.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@leodemoura leodemoura Awaiting requested review from leodemoura leodemoura will be requested when the pull request is marked ready for review leodemoura is a code owner
@hargoniX hargoniX Awaiting requested review from hargoniX hargoniX will be requested when the pull request is marked ready for review hargoniX is a code owner

Assignees

No one assigned

Labels

toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

AltStyle によって変換されたページ (->オリジナル) /