-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 200
ADR written to .claude/pipeline-artifacts/design.md (291 lines). Key architectural decisions documented:
- Static grep-based shared-state detection (6 patterns) over runtime isolation or manual annotation â€" conservative, portable, zero dependencies
- Temp file + grep for parallel result propagation â€" solves the bash subshell variable loss problem without IPC
- CPU-aware worker cap [2, 8] â€" scales to the machine, doesn't overwhelm constrained systems
-
Facade with raw
bash -cfallback â€" optimizer is purely additive, never worse than baseline
Four alternatives rejected with rationale: Vitest native (wrong layer), GNU parallel (no intelligence), Makefile deps (maintenance cost), Docker isolation (overhead exceeds gains).
" disable via SW_TEST_OPTIMIZER=false or optimization: off in pipeline config
- Must not break existing test correctness â€" parallel tests that share state must be detected and run sequentially
- History/prioritization must be append-only and self-healing (corrupt JSONL lines skipped on read)
Introduce a test execution optimizer (scripts/lib/test-optimizer.sh) as a library sourced by both pipeline stages and the loop harness. The optimizer implements a four-phase execution model:
â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
â"‚ Entry Points â"‚
â"‚ stage_test() â"‚
â"‚ run_test_gate() â"‚
â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
â"‚
â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
â"‚ testopt_execute() â"‚
â"‚ Facade orchestrator â"‚
â"‚ - parses options â"‚
â"‚ - gates on <3 tests â"‚
â"‚ - falls back on init err â"‚
â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
â"‚
â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
â"‚ â"‚ â"‚
â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"� â"Œâ"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"� â"Œâ"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"�
â"‚ Phase 1: Init â"‚ â"‚ Phase 2: â"‚ â"‚ Phase 3: â"‚
â"‚ testopt_init() â"‚ â"‚ Prioritize â"‚ â"‚ Partition â"‚
â"‚ - discover â"‚ â"‚ testopt_ â"‚ â"‚ testopt_ â"‚
â"‚ *-test.sh â"‚ â"‚ prioritize()â"‚ â"‚ partition_ â"‚
â"‚ - load history â"‚ â"‚ - fail_rate â"‚ â"‚ shared_state() â"‚
â"‚ - git diff â"‚ â"‚ DESC â"‚ â"‚ - 6 patterns â"‚
â"‚ - select â"‚ â"‚ - duration â"‚ â"‚ → parallel[] â"‚
â"‚ affected â"‚ â"‚ ASC â"‚ â"‚ → sequential[] â"‚
â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â" ̃ â""â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â" ̃ â""â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
â"‚
â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
â"‚ â"‚
â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"� â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
â"‚ Phase 4a: Parallel â"‚ â"‚ Phase 4b: Sequential â"‚
â"‚ testopt_run_parallelâ"‚ â"‚ testopt_run_with_ â"‚
â"‚ - N workers (2-8) â"‚ â"‚ fast_fail â"‚
â"‚ - dir-grouped â"‚ â"‚ - stop on first failure â"‚
â"‚ - temp-file results â"‚ â"‚ - skip if 4a failed + â"‚
â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃ â"‚ fast_fail enabled â"‚
â"‚ â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
â"‚
â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
â"‚ Output â"‚
â"‚ - history → JSONL append â"‚
â"‚ - evidence → JSON file â"‚
â"‚ - events → emit_event() â"‚
â"‚ - report → stdout â"‚
â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
1. Shared-state detection via static grep analysis (not runtime isolation)
- Context: Need to determine which tests can run in parallel without race conditions.
-
Decision: Grep each test file for 6 patterns indicating shared state: hardcoded
/tmppaths, port binding, SQLite files, PID/lock files, singleton TMPDIR assignments, and global config sourcing. - Alternatives rejected: (a) Runtime sandboxing with namespaces/cgroups â€" too heavy, not portable to macOS. (b) Manual annotation â€" requires maintainer discipline, gets stale.
-
Consequences: False positives send safe tests to the sequential bucket (slower but correct). False negatives would cause flaky parallel failures. The patterns are intentionally conservative â€" Pattern 6 (config sourcing) captures anything that
sources a config file, which over-classifies but prevents subtle global-state mutation bugs.
2. Subshell variable propagation via temp file + grep (not direct variable)
-
Context: Background jobs in
testopt_run_parallel()run in subshells. Variable assignments (all_passed=false) don't propagate back to the parent. -
Decision: Each background job writes
"<file> PASS|FAIL <duration>"lines to a shared temp file. Parent checksgrep -q ' FAIL 'on the results file. -
Consequences: Simple, correct, no IPC mechanisms. The temp file may see interleaved writes from concurrent jobs, but each line is a single
echowhich is atomic for small writes on both Linux and macOS.
3. CPU-aware worker detection with hard cap at 8
- Context: Need to scale parallelism to the machine without overwhelming constrained systems.
-
Decision: Detect cores via
sysctl -n hw.ncpu(Darwin) //proc/cpuinfo(Linux) /nproc, use 75%, clamp to [2, 8]. - Consequences: 8-core machine gets 6 workers. 2-core CI runner gets 2 workers. Memory-constrained systems with many cores still get capped at 8.
4. Facade pattern with fallback to raw bash -c
- Context: The optimizer must never be worse than doing nothing.
-
Decision:
testopt_execute()falls back tobash -c "$test_cmd"when: (a) init fails, (b) fewer than 3 test files discovered, or (c) the optimizer is disabled via config. - Consequences: Zero risk of regression for edge cases. The optimizer is purely additive.
1. stage_test() reads pipeline config: optimization \!= "off" → calls testopt_execute()
run_test_gate() reads SW_TEST_OPTIMIZER \!= "false" → calls testopt_execute()
2. testopt_execute(".", "npm test", "--fast-fail")
â""â"€â"€ testopt_init(".")
â"œâ"€â"€ find *-test.sh *_test.sh test_*.sh → DISCOVERED_TESTS[]
â"œâ"€â"€ read ~/.shipwright/optimization/test-history.jsonl → TEST_HISTORY[]
â"œâ"€â"€ git diff HEAD~1..HEAD → CHANGED_FILES[]
â""â"€â"€ testopt_select_affected() → AFFECTED_TESTS[] (directory + source matching)
3. testopt_prioritize(AFFECTED_TESTS)
â""â"€â"€ for each test: score = (fail_rate * 10000) - duration_s
â""â"€â"€ sort -rn → highest-fail-rate, fastest-duration first
4. testopt_partition_shared_state(prioritized_tests)
â""â"€â"€ grep each file for 6 patterns → "SHARED:<path>" or "INDEPENDENT:<path>"
â"œâ"€â"€ INDEPENDENT → parallel_tests[]
â""â"€â"€ SHARED → sequential_tests[]
5. Phase 4a: testopt_run_parallel(parallel_tests, workers=detect_cores*0.75)
â""â"€â"€ group tests by directory → background subshells → wait → grep results file
6. Phase 4b: testopt_run_with_fast_fail(sequential_tests)
â""â"€â"€ skipped entirely if Phase 4a failed AND fast_fail=true
â""â"€â"€ otherwise runs one-by-one, breaks on first failure
7. Record results → test-history.jsonl (JSONL append)
Write evidence → $ARTIFACTS_DIR/test-optimizer-evidence.json
Emit events → testopt.parallel_done, testopt.sequential_done, testopt.fail_fast
# Main entry point â€" called by pipeline and loop testopt_execute <project_root> <test_cmd> [options...] Options: --max-workers=N # int, override CPU-detected worker count (2-8) --fast-fail # bool, stop on first failure (default) --continue-on-fail # bool, run all tests despite failures --mode=auto|parallel|sequential # execution mode (default: auto) Returns: exit 0 (all pass) | exit 1 (any failure) Errors: falls back to raw bash -c on init failure # CPU detection â€" platform-aware core counting testopt_detect_cores() -> stdout: integer (2-8) Errors: returns 4 (safe default) on detection failure # Shared-state classification â€" static analysis of test files testopt_partition_shared_state(file...) -> stdout: "SHARED:<path>" | "INDEPENDENT:<path>" Errors: non-existent files classified as INDEPENDENT # Affected test selection â€" git-diff-driven test filtering testopt_select_affected(changed_files...) -> sets AFFECTED_TESTS global array Errors: empty changed_files → returns all discovered tests # Priority ordering â€" fail-rate weighted sort testopt_prioritize(tests...) -> stdout: sorted test file paths (one per line) Errors: missing history → all tests score equally # Parallel runner â€" background subshell execution testopt_run_parallel(--max-workers=N, tests...) -> exit 0|1 Errors: writes FAIL lines to temp file, checked via grep # Sequential runner â€" fast-fail execution testopt_run_with_fast_fail([--continue-on-fail], tests...) -> exit 0|1 Errors: non-existent test files silently skipped
| Component | Handles | Propagation | Fallback |
|---|---|---|---|
testopt_execute() |
Init failures, <3 tests | Returns raw bash -c exit code |
Transparent fallthrough to original behavior |
testopt_init() |
Missing project root, no git, no history | Logs warning, continues with empty arrays | All tests treated as affected, no prioritization |
testopt_run_parallel() |
Subshell crashes, missing test files | Writes FAIL to results temp file |
grep -q ' FAIL ' on results file catches all |
testopt_run_with_fast_fail() |
Test exit code != 0 | Breaks loop (fast-fail) or continues (flag) | Returns 1 with failed test name on stdout |
testopt_record_history() |
Write failures, missing directory | Suppressed via 2>/dev/null || true
|
Missing history = no prioritization (graceful) |
testopt_partition_shared_state() |
Non-existent files | Classified as INDEPENDENT | Conservative â€" false positives go sequential |
-
Pros: Vitest already supports
--pool threads/forks, parallel file execution, and--reporterfor structured output. Would work natively withnpm test. -
Cons: This project's test suite is 102+ bash test scripts, not Vitest test files. The
npm testcommand dispatches to these bash scripts. Vitest parallelism would only help if tests were.ts/.jsfiles. Doesn't address affected-first or shared-state detection for bash scripts. - Why rejected: Wrong layer. The optimization target is bash-level test file execution, not Node-level test runner internals.
-
Pros: Battle-tested parallel execution. Simple:
find *-test.sh | parallel -j$(nproc) bash {}. -
Cons: No shared-state detection â€" would run all tests in parallel including those that fight over ports/files. No affected-first prioritization. No fast-fail cascade between parallel and sequential phases. Adds a dependency (
parallelnot installed by default on macOS). - Why rejected: Lacks the intelligence layer (prioritization, partitioning, history). Would need the same detection code wrapped around it anyway.
-
Pros: Explicit dependency declaration between test files.
make -jhandles parallelism natively. - Cons: Requires maintaining a Makefile with test dependencies â€" high maintenance burden. Every new test file needs a rule. No automatic shared-state detection. Foreign to the existing bash-centric architecture.
- Why rejected: Maintenance cost too high for 102+ test files. Static declaration gets stale.
- Pros: Perfect isolation â€" no shared-state concerns. Every test gets a clean filesystem.
- Cons: Container startup overhead (~2-5s per test) would negate parallelism gains. 102 containers would require significant memory. Not available in all CI environments. Massive complexity increase.
- Why rejected: Overhead exceeds the time savings from parallelism.
| File | Purpose |
|---|---|
scripts/lib/test-optimizer.sh (741 lines) |
Core library: detection, partitioning, prioritization, execution, history, reporting |
scripts/sw-test-optimizer-integration-test.sh (438 lines) |
25 integration tests covering all new functions |
| File | Change |
|---|---|
scripts/lib/pipeline-stages-build.sh:569-584 |
stage_test() reads optimization from pipeline config, calls testopt_execute() when not off
|
scripts/sw-loop.sh:997-1000 |
run_test_gate() checks SW_TEST_OPTIMIZER, calls testopt_execute() when not false
|
templates/pipelines/*.json (9 files) |
Added "optimization": "auto", "fast_fail": true to test stage config |
- None new. Uses only:
bash,jq,grep,find,sort,awk,mktemp,sysctl/nproc,git
-
Shared-state false negatives â€" Pattern 6 (config sourcing) is broad but not exhaustive. A test could share state via an unconventional mechanism (e.g., writing to a well-known path without matching any of the 6 patterns). Mitigation:
--mode=sequentialoverride andSW_TEST_OPTIMIZER=falsekill switch. -
Concurrent JSONL writes â€" Multiple parallel pipelines (worktrees) appending to
~/.shipwright/optimization/test-history.jsonlsimultaneously. Small single-line appends are atomic on Linux/macOS for typical filesystem block sizes, but not guaranteed. Mitigation: JSONL format is self-healing â€" corrupt lines are skipped on read. -
History file unbounded growth â€"
test-history.jsonlgrows indefinitely. For 102 tests * 10 runs/day * 365 days = ~372K lines. Mitigation: not yet implemented. Future work: add rotation or tail-N windowing.
- All 9 pipeline templates contain
"optimization": "auto"and"fast_fail": truein test stage config -
stage_test()callstestopt_execute()whenoptimization \!= "off"â€" atpipeline-stages-build.sh:576-581 -
run_test_gate()callstestopt_execute()whenSW_TEST_OPTIMIZER \!= "false"â€" atsw-loop.sh:997-1000 - Backwards compatible:
SW_TEST_OPTIMIZER=falsebypasses optimizer in loop - Backwards compatible:
optimization: "off"bypasses optimizer in pipeline - Fallback on <3 test files â€" at
test-optimizer.sh:621 - Fallback on init failure â€" at
test-optimizer.sh:614-618 - CPU detection clamped to [2, 8] â€" at
test-optimizer.sh:517-518 - 6 shared-state patterns implemented â€" at
test-optimizer.sh:537-567 - Parallel runner uses temp-file + grep for result propagation â€" at
test-optimizer.sh:382-413 - Evidence JSON written to
$ARTIFACTS_DIR/test-optimizer-evidence.json - Events emitted:
testopt.parallel_done,testopt.sequential_done,testopt.fail_fast,testopt.recorded - All 25 integration tests pass
- All 20 existing unit tests pass
- Pipeline template config discoverable via
shipwright templates list
| Layer | Count | Coverage Target | What's Tested |
|---|---|---|---|
| Unit | 20 (sw-test-optimizer-test.sh) |
Core library functions | Discovery, history load/query, affected selection, prioritization sort, fast-fail, parallel execution |
| Integration | 25 (sw-test-optimizer-integration-test.sh) |
New functions + wiring |
testopt_detect_cores, testopt_partition_shared_state, testopt_execute orchestrator, stage_test() wiring, run_test_gate() wiring |
| E2E | 0 (covered by existing npm test) |
Full pipeline flow | Optimizer activates during normal npm test runs â€" 102+ suites exercise the real path |
Coverage targets: 100% of public functions have direct tests. Error paths (missing root, <3 tests, empty history) explicitly tested. Edge cases (all-shared, all-independent, single file) covered.
Happy path: 5 independent + 2 shared-state tests → parallel phase runs first with N workers → sequential phase runs second → all pass → exit 0, evidence JSON written.
Error cases:
- Failing test in parallel bucket + fast-fail → sequential bucket skipped entirely → exit 1 with
testopt.fail_fastevent emitted - Failing test with
--continue-on-fail→ all tests run regardless → exit 1 with full results
Edge cases:
- Zero test files discovered → falls through to
bash -c "$test_cmd"(no optimizer overhead) - All tests classified as SHARED → parallel bucket empty, sequential bucket gets everything → behaves like original sequential execution
| Metric | Current Value | Source |
|---|---|---|
| Full test suite wall-clock | ~1365s |
memory/metrics.json (2026年04月04日 baseline) |
| Execution mode | Sequential only | Single bash -c "$test_cmd"
|
| Time to first failure | Up to ~1365s (worst case) | No early termination |
| Metric | Target | Rationale |
|---|---|---|
| Full suite wall-clock | <900s (34% reduction) | Parallelism across ~75% of tests classified as independent |
| Time to first failure | <200s (70%+ reduction) | Affected-first prioritization + fast-fail stops early |
| Test correctness | Zero regressions | Shared-state partitioning prevents parallel flakes |
-
Wall-clock per phase:
testopt.parallel_doneandtestopt.sequential_doneevents capture duration per phase -
Evidence JSON:
test-optimizer-evidence.jsonrecords total/parallel/sequential counts, workers, mode, exit code per run -
Historical trending:
test-history.jsonlaccumulates per-test duration and pass/fail data across runs -
Dashboard integration:
/api/metrics/stage-performancesurfaces test stage duration trends from event data
| Step | Method | Success Criteria |
|---|---|---|
| Before |
time npm test on main branch |
Record wall-clock baseline (~1365s) |
| After |
time npm test on feature branch |
Wall-clock < 900s |
| Verify | Read test-optimizer-evidence.json
|
parallel_tests > 0, workers > 1, exit_code: 0
|
| Regression check | Compare test pass counts | Same number of PASS/FAIL as main branch |