-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 200
ADR written to .claude/pipeline-artifacts/design.md.
Key design decisions:
-
Facade Orchestrator — Single new
testopt_execute()function as the sole integration point, called by bothstage_test()andrun_test_gate(). Falls back to rawbash -cwhen the optimizer can't help (non-shell tests, tiny suites). -
Shared-state partitioning — 6 grep patterns classify tests as SHARED (sequential) or INDEPENDENT (parallel). False negatives self-heal via the history system.
-
Fork exhaustion prevention — 75% of cores, hard capped at [2, 8], using the existing
jobs -rthrottle. -
Full backwards compatibility —
SW_TEST_OPTIMIZER=falsebypasses everything. Non-shell test commands pass through untouched. -
Three new functions in
test-optimizer.sh:testopt_detect_cores(),testopt_partition_shared_state(),testopt_execute(). Five files modified total, one new test file created.
Expected gains: ~2.5x speedup via parallelism, 50% reduction in failure-case duration via fast-fail, 80-90% reduction for incremental changes via affected-test selection.
-test/full-test alternation (FAST_TEST_INTERVAL) must be preserved
- Parallel test execution must avoid fork-bombing (historical failure pattern)
Add a single new function testopt_execute() to test-optimizer.sh that serves as the sole integration point. Both stage_test() and run_test_gate() call this one function instead of raw bash -c. The orchestrator internally sequences: init → discover → select-affected → prioritize → partition (shared-state vs independent) → execute (parallel for independent, sequential for shared-state) → record history → report.
┌─────────────────────────────────────────────────────────────────┐
│ Callers │
│ ┌──────────────────────┐ ┌────────────────────────────────┐ │
│ │ stage_test() │ │ run_test_gate() │ │
│ │ pipeline-stages- │ │ sw-loop.sh:961 │ │
│ │ build.sh:546 │ │ │ │
│ └──────────┬───────────┘ └──────────────┬─────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ testopt_execute() [NEW] │ │
│ │ Entry point: project_root, test_cmd, options │ │
│ └──────────┬──────────┬──────────┬──────────┬──────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────┐ ┌────────┐ ┌───────┐ ┌────────────────────────┐ │
│ │ detect │ │ select │ │ prio- │ │ partition & execute │ │
│ │ cores │ │ affect-│ │ ritize│ │ ┌──────┐ ┌───────────┐ │ │
│ │ [NEW] │ │ ed │ │ │ │ │parall│ │sequential │ │ │
│ └─────────┘ └────────┘ └───────┘ │ │ (ind)│ │(shared st)│ │ │
│ │ └──────┘ └───────────┘ │ │
│ └────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ record_history + │ │
│ │ report + evidence │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
// New functions added to test-optimizer.sh /** * Detect available CPU cores, return 75% (capped 2-8) * @returns number via stdout * Errors: returns 4 (safe default) on detection failure */ function testopt_detect_cores(): number // stdout: integer /** * Scan test files for shared-state indicators * @param test_files - space-separated list of test file paths * @returns lines of "SHARED:<file>" or "INDEPENDENT:<file>" to stdout * Errors: treats scan failure as INDEPENDENT (safe default) */ function testopt_partition_shared_state(test_files: string[]): string[] // stdout /** * Single entry point for optimized test execution * @param project_root - path to project root * @param test_cmd - fallback command (used if no shell test files found) * @param --max-workers=N - override core detection (optional) * @param --fast-fail - stop on first failure (default: true) * @param --continue-on-fail - run all tests even on failure * @param --mode=parallel|sequential|auto - execution mode (default: auto) * @returns exit code 0 on all pass, 1 on any failure * Side effects: writes test evidence JSON, updates history JSONL * Errors: falls back to raw bash -c "$test_cmd" on any internal error */ function testopt_execute(project_root: string, test_cmd: string, ...opts: string[]): number
-
Caller invokes
testopt_execute "." "$test_cmd" --fast-fail -
Core detection:
sysctl -n hw.ncpu(macOS) /nproc(Linux) → multiply by 0.75 → clamp to [2, 8] -
Init:
testopt_initdiscovers test files, loads history, identifies changed files viagit diff -
Decision gate: If discovered test count < 3 OR test files are not shell scripts (
.sh), fall through to rawbash -c "$test_cmd"— the optimizer adds no value for small suites or non-shell runners (vitest, jest, etc.) -
Prioritize: Score by
(fail_rate * 100) * 100 - duration, highest-risk tests first -
Partition: Scan each test file for shared-state patterns (hardcoded
/tmp/paths without$$, portbind/listen,sqlite3file locks, global PID files, filesystem singletons). Files matching any pattern → sequential bucket; rest → parallel bucket -
Execute parallel bucket via existing
testopt_run_parallel --max-workers=$cores -
Execute sequential bucket via existing
testopt_run_with_fast_fail -
If either bucket fails and
--fast-failis set, skip remaining sequential tests -
Record results to
~/.shipwright/optimization/test-history.jsonl -
Write evidence to
$ARTIFACTS_DIR/test-optimizer-evidence.jsonor$LOG_DIR/test-evidence-iter-${ITERATION}.json - Return aggregate exit code
| Component | Error Handling | Propagation |
|---|---|---|
testopt_detect_cores |
Returns default 4 on any failure | Silent — caller gets safe default |
testopt_partition_shared_state |
Treats unreadable files as INDEPENDENT | Silent — worst case runs flaky test in parallel (test itself should handle) |
testopt_execute (init failure) |
Falls back to raw bash -c "$test_cmd"
|
Logs warning, runs unoptimized |
testopt_execute (parallel failure) |
Collects exit codes from wait
|
Propagates as exit code 1 |
| History write failure | ` | |
stage_test integration |
Catches testopt_execute failure, proceeds to dark factory features |
Failure propagates normally after holdout/mutation/causal analysis |
run_test_gate integration |
Sets TEST_PASSED=false on failure |
Same as current behavior |
# 1. Hardcoded /tmp paths without $$ or mktemp (collision risk) grep -l '/tmp/[a-zA-Z]' "$file" | grep -v '\$\$\|mktemp' # 2. Port binding (tests fighting for same port) grep -l 'bind\|listen\|EADDRINUSE\|:808[0-9]' "$file" # 3. SQLite file locks (non-WAL concurrent access) grep -l 'sqlite3.*\.db\|\.sqlite' "$file" # 4. PID files or lock files grep -l '\.pid\|\.lock\|flock' "$file" # 5. Singleton temp directories (shared across tests) grep -l 'TMPDIR=\|TMP_DIR=' "$file" | grep -v 'mktemp -d' # 6. Global state mutation (sourcing shared config that sets globals) grep -l 'source.*config\|\..*config\.sh' "$file"
Pipeline config (templates/pipelines/*.json):
{
"stages": [{
"id": "test",
"config": {
"optimization": "auto",
"max_workers": 0,
"fast_fail": true
}
}]
}Environment variables (for loop/ad-hoc):
-
SW_TEST_OPTIMIZER=true|false|auto— master switch (default:auto, meaning enabled whentest-optimizer.shis available) -
SW_TEST_MAX_WORKERS=N— override core detection
Pipeline template discoverability: The optimization field will be present in all templates' test stage config. shipwright templates list (which reads templates/pipelines/*.json) will surface this naturally.
-
Pros: vitest already handles parallelism, worker isolation, and affected-test selection (
--changed). Zero new code. - Cons: Only works for JS/TS projects using vitest. Shipwright is polyglot — it orchestrates bash test suites across any language. The 121+ bash test suites cannot use vitest parallelism. Does not solve the pipeline integration problem.
- Pros: Battle-tested parallelism, job control, retry, structured output.
-
Cons: Not installed by default on macOS. Adds an external dependency. The current
jobs -rapproach intestopt_run_parallelworks for the scale we need (2-8 workers, dozens of test files). Over-engineering for the problem size.
- Pros: Simpler code path — one way to run tests everywhere.
-
Cons: Breaks backwards compatibility. Many users pass custom
--test-cmdstrings that are not shell scripts (e.g.,npm test,cargo test). The optimizer's file-level discovery doesn't apply to these. A facade that falls through to raw execution is safer.
- Pros: Centralized configuration, could use ML-based predictions.
- Cons: The composer generates static pipeline JSON at spawn time. Test optimization needs runtime decisions (which files changed THIS iteration, current CPU load). Wrong abstraction layer.
-
scripts/lib/test-optimizer.sh— Addtestopt_detect_cores(),testopt_partition_shared_state(),testopt_execute() -
scripts/lib/pipeline-stages-build.sh— Wirestage_test()to calltestopt_executewhen optimization is enabled -
scripts/sw-loop.sh— Wirerun_test_gate()to calltestopt_executefor shell test suites whenSW_TEST_OPTIMIZERis set -
templates/pipelines/*.json— Addoptimizationconfig field to test stage in all 8 templates
-
scripts/sw-test-optimizer-integration-test.sh— Integration tests for the newtestopt_executeorchestrator and wiring into pipeline/loop
- None new. Uses only
sysctl/nproc(already available on target platforms),grep,jq,bashbuiltins.
-
Fork exhaustion (HIGH): The historical
failures.jsondocuments fork failures under parallel load. Mitigation: hard cap at 8 workers, default 75% of cores (capped at 2 minimum). Thewhile [[ $(jobs -r | wc -l) -ge "$max_workers" ]]throttle intestopt_run_parallelalready prevents unbounded forking. - Shared-state false negatives (MEDIUM): The 6 grep patterns may miss custom shared-state patterns (e.g., a test that uses a named pipe or System V semaphore). Mitigation: partition defaults to INDEPENDENT, so worst case a flaky test runs in parallel and fails — the fast-fail catches it, and the next run records the failure in history, increasing its priority score for sequential execution next time. Self-healing via the history system.
-
Test command fallthrough (LOW): Non-shell test commands (e.g.,
npm test) must bypass the optimizer cleanly. Mitigation: the decision gate checks if discovered tests are.shfiles and if count >= 3 before engaging optimization. Otherwise falls through to raw execution. -
Race condition in parallel result file (LOW): Multiple background jobs write to
$tmp_resultsconcurrently. Mitigation: the existing>>append is atomic for lines < PIPE_BUF (4096 bytes on all target platforms). Each line is well under that limit.
-
testopt_detect_coresreturns integer in [2, 8] on macOS and Linux -
testopt_partition_shared_statecorrectly classifies test files with hardcoded/tmp/fooas SHARED and files usingmktempas INDEPENDENT -
testopt_executewith 10+ mock test files runs parallel bucket before sequential bucket, respects--max-workers, and returns correct aggregate exit code -
testopt_executefalls back to rawbash -c "$test_cmd"when fewer than 3 shell test files are discovered -
stage_test()uses optimizer when pipeline config has"optimization": "auto"andtest-optimizer.shis loaded -
stage_test()bypasses optimizer when config says"optimization": "off" -
run_test_gate()uses optimizer whenSW_TEST_OPTIMIZER=trueand test command invokes shell scripts -
run_test_gate()preserves fast-test/full-test alternation (FAST_TEST_INTERVAL) — optimizer only applies to the selected command - Dark factory features (holdout, mutation, causal) still execute after optimized test run in
stage_test() -
SW_TEST_OPTIMIZER=falsecompletely bypasses all optimization in both code paths - All 8 pipeline templates include
"optimization": "auto"in test stage config -
shipwright templates listsurfaces the optimization field (existing behavior — templates list reads JSON) - History JSONL accumulates across runs and influences prioritization order in subsequent executions
- No fork exhaustion under parallel execution with max_workers=8 on a standard CI machine
Unit tests (8): Core detection, shared-state partitioning (6 patterns individually), worker clamping, history integration
Integration tests (4): testopt_execute end-to-end with mock test files, pipeline stage_test wiring, loop run_test_gate wiring, fallback-to-raw behavior
E2E tests (1): Full optimizer flow with real git repo, changed files, affected selection, parallel execution, history recording
Total: 13 new tests in scripts/sw-test-optimizer-integration-test.sh, complementing the existing 8 tests in scripts/sw-test-optimizer-test.sh.
-
testopt_detect_cores: 100% branch coverage (macOS path, Linux path, fallback path) -
testopt_partition_shared_state: All 6 patterns tested individually + combined -
testopt_execute: Happy path, fallback path, fast-fail path, continue-on-fail path - Pipeline/loop wiring: enabled path, disabled path, missing-module path
Happy path: 10 mock test files → detect 4 cores → partition 7 independent + 3 shared → parallel bucket runs 4-at-a-time → sequential bucket runs with fast-fail → all pass → exit 0
Error case 1: Parallel test fails → fast-fail stops sequential bucket → exit 1 → history records failure
Error case 2: testopt_init fails (no git repo) → falls back to raw bash -c "$test_cmd" → logs warning → exit code from raw command
Edge case 1: Zero test files discovered → immediate fallback to raw command
Edge case 2: All tests classified as SHARED → no parallel execution → purely sequential with fast-fail
- Full suite: ~1575s (26 min) across 121+ bash test files (from
metrics.json, 2026年03月09日) - Current execution: strictly sequential, single process
- No parallelism, no affected-test selection, no prioritization in pipeline/loop
- p50 test stage duration: Reduce from ~1575s to ~600s (2.5x speedup via 4-worker parallelism on typical 4-core CI)
- Fast-fail savings: When a test fails early in a 121-test suite, skip remaining tests — expected 50% reduction in failure-case duration
- Affected-test selection: On incremental changes touching 1-3 files, run only ~10-20 affected tests instead of 121+ — expected 80-90% reduction
- Instrument
testopt_executewithdate +%stimestamps at each phase (detect, init, partition, parallel, sequential, report) - Write phase timings to evidence JSON for post-mortem analysis
- Compare
test_duration_sin pipeline events before/after optimization is enabled - The DORA metrics system (
sw-dora.sh) already trackstest_duration_s— optimization gains will be visible in dashboards
-
Before: Run
sw-test-optimizer-test.sh+sw-pipeline-test.shwithSW_TEST_OPTIMIZER=false— record total duration -
After: Same suites with
SW_TEST_OPTIMIZER=true— record total duration - Success criteria: Parallel execution reduces wall-clock time by at least 1.5x on a 4+ core machine; fast-fail reduces failure-case time by at least 30%
- Data volume: Test with realistic suite (121+ files), not just unit test fixtures