Epic: Simulator — Fuzzing, Non-Determinism Detection & CI Integration #3

New issue

Open

Labels

enhancement

@tac0turtle

Description

@tac0turtle

tac0turtle

opened

on Feb 17, 2026

Goal

Expand the Evolve simulator into a comprehensive testing and validation tool that runs in CI on every PR. The finished state: CI catches non-determinism, fuzzes critical paths, and tracks performance regressions automatically.

Current State

Simulator (evolve_simulator): Seed-based determinism, fault injection, time simulation, basic metrics/reporting. Used in testapp integration tests.
Fuzzing: Limited to tx encoding (fuzz_decode, fuzz_roundtrip, fuzz_structured) in crates/app/tx/fuzz/. Requires cargo +nightly fuzz — not integrated into CI.
CI (rust.yml): Runs cargo test --workspace on every PR. Long simulation tests exist but are manual-only (workflow_dispatch).
No non-determinism detection: No automated check that the same seed produces identical state across runs.

Scope

1. Non-Determinism Detection

Dual-execution oracle: Run the same simulation seed twice and assert identical state hashes at every block. This is the core invariant — if it ever fails, consensus is broken.
Cross-platform determinism check: Ensure state hashes match between macOS and Linux (CI runs Linux, devs run macOS). May require pinning float ops or auditing platform-dependent behavior.
Iteration order audit: Automated check that no HashMap/HashSet usage leaks into STF execution paths (beyond the existing clippy lint — runtime verification).
Time source audit: Verify no SystemTime/Instant usage reaches STF execution. Simulator's SimulatedTime should be the only time source during execution.

2. Expanded Fuzzing

STF fuzzing: Fuzz the full apply_block path with randomly generated blocks (random tx ordering, random payloads, malformed inputs). Assert no panics, no state corruption.
Storage layer fuzzing: Fuzz the storage backend with random key/value operations. Verify state hash consistency after commit.
Account execution fuzzing: Generate random sequences of exec/query calls against accounts. Verify error handling (no panics, proper error codes).
Mempool fuzzing: Fuzz transaction insertion, eviction, and ordering under concurrent load.
Corpus management: Maintain and expand fuzz corpus with interesting inputs found during runs. Store corpus in CI cache.

3. Performance Testing & Regression Detection

Benchmark baseline: Establish criterion benchmarks for key paths (block execution, tx processing, storage read/write, state hashing).
Simulator performance report: Extend PerformanceReport with p50/p95/p99 latencies, throughput (tx/s, blocks/s), and memory high-water mark.
Regression detection: Compare benchmark results against a baseline (stored as artifact or in-repo). Fail CI or post a warning comment if performance degrades beyond threshold (e.g., >10%).
Stress test profile: Standardized stress test config (high block count, high tx volume, fault injection) that runs on a schedule (nightly or weekly).

4. CI Integration

Simulation tests on every PR: Run a short simulation suite (e.g., 100 blocks, 3 seeds) as part of the standard test job. Must complete in <5 min.
Non-determinism check on every PR: Dual-execution with at least 2 seeds. Fast — just comparing hashes.
Nightly fuzzing job: Run cargo-fuzz (or bolero/proptest long runs) for extended duration (30–60 min). Report findings as issues.
Nightly performance run: Run benchmarks + long simulation, store results as artifacts, compare against baseline.
Seed rotation: Each CI run uses a mix of fixed seeds (regression) and random seeds (exploration). Failed random seeds are logged for reproduction.
Failure reproduction: On any simulation failure, CI output includes the exact just sim-seed <seed> command to reproduce locally.

5. Simulator Enhancements

Transaction generators: Configurable random transaction generators for simulation (valid txs, invalid txs, edge-case txs). Currently manual — should be built into the simulator.
Scenario DSL or config: Define simulation scenarios (e.g., "normal load for 50 blocks, then spike to 10x, then fault injection") as config rather than code.
Shrinking on failure: When a simulation fails, automatically try to find the minimal reproducing seed/block sequence (inspired by proptest shrinking).
Coverage tracking: Integrate with coverage tools to measure what % of STF/account code paths are exercised by simulation.

Success Criteria

Every PR runs simulation tests (short suite, <5 min) + non-determinism dual-execution check.
Nightly CI job fuzzes STF, storage, and mempool for 30+ min and files issues on findings.
Performance benchmarks run nightly with regression detection — degradations >10% are flagged.
A failing simulation always prints its reproduction command.
Zero known non-determinism sources in the STF execution path.

Implementation Notes

Start with non-determinism detection (highest value, lowest effort) — dual-execution is just "run twice, compare hashes."
For fuzzing, consider bolero as it works with both libfuzzer and proptest backends, avoiding the nightly-only cargo-fuzz limitation.
Performance baselines can use GitHub Actions artifacts or git notes for storage.
Keep CI wall time in check — simulation and fuzzing are useless if they make PRs slow. Short suite on PR, long suite on nightly.

Metadata

Assignees

No one assigned

Labels

enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Simulator — Fuzzing, Non-Determinism Detection & CI Integration #3

Description

Goal

Current State

Scope

1. Non-Determinism Detection

2. Expanded Fuzzing

3. Performance Testing & Regression Detection

4. CI Integration

5. Simulator Enhancements

Success Criteria

Implementation Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions