Pipeline Design 200

Seth Ford edited this page Apr 4, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Key design decisions:

Facade Orchestrator — Single new testopt_execute() function as the sole integration point, called by both stage_test() and run_test_gate(). Falls back to raw bash -c when the optimizer can't help (non-shell tests, tiny suites).
Shared-state partitioning — 6 grep patterns classify tests as SHARED (sequential) or INDEPENDENT (parallel). False negatives self-heal via the history system.
Fork exhaustion prevention — 75% of cores, hard capped at [2, 8], using the existing jobs -r throttle.
Full backwards compatibility — SW_TEST_OPTIMIZER=false bypasses everything. Non-shell test commands pass through untouched.
Three new functions in test-optimizer.sh: testopt_detect_cores(), testopt_partition_shared_state(), testopt_execute(). Five files modified total, one new test file created.

Expected gains: ~2.5x speedup via parallelism, 50% reduction in failure-case duration via fast-fail, 80-90% reduction for incremental changes via affected-test selection. -test/full-test alternation (FAST_TEST_INTERVAL) must be preserved

Parallel test execution must avoid fork-bombing (historical failure pattern)

Decision

Architecture: Facade Orchestrator Pattern

Add a single new function testopt_execute() to test-optimizer.sh that serves as the sole integration point. Both stage_test() and run_test_gate() call this one function instead of raw bash -c. The orchestrator internally sequences: init → discover → select-affected → prioritize → partition (shared-state vs independent) → execute (parallel for independent, sequential for shared-state) → record history → report.

Component Diagram

┌─────────────────────────────────────────────────────────────────┐
│ Callers │
│ ┌──────────────────────┐ ┌────────────────────────────────┐ │
│ │ stage_test() │ │ run_test_gate() │ │
│ │ pipeline-stages- │ │ sw-loop.sh:961 │ │
│ │ build.sh:546 │ │ │ │
│ └──────────┬───────────┘ └──────────────┬─────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ testopt_execute() [NEW] │ │
│ │ Entry point: project_root, test_cmd, options │ │
│ └──────────┬──────────┬──────────┬──────────┬──────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────┐ ┌────────┐ ┌───────┐ ┌────────────────────────┐ │
│ │ detect │ │ select │ │ prio- │ │ partition & execute │ │
│ │ cores │ │ affect-│ │ ritize│ │ ┌──────┐ ┌───────────┐ │ │
│ │ [NEW] │ │ ed │ │ │ │ │parall│ │sequential │ │ │
│ └─────────┘ └────────┘ └───────┘ │ │ (ind)│ │(shared st)│ │ │
│ │ └──────┘ └───────────┘ │ │
│ └────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ record_history + │ │
│ │ report + evidence │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Interface Contracts

// New functions added to test-optimizer.sh
/**
 * Detect available CPU cores, return 75% (capped 2-8)
 * @returns number via stdout
 * Errors: returns 4 (safe default) on detection failure
 */
function testopt_detect_cores(): number // stdout: integer
/**
 * Scan test files for shared-state indicators
 * @param test_files - space-separated list of test file paths
 * @returns lines of "SHARED:<file>" or "INDEPENDENT:<file>" to stdout
 * Errors: treats scan failure as INDEPENDENT (safe default)
 */
function testopt_partition_shared_state(test_files: string[]): string[] // stdout
/**
 * Single entry point for optimized test execution
 * @param project_root - path to project root
 * @param test_cmd - fallback command (used if no shell test files found)
 * @param --max-workers=N - override core detection (optional)
 * @param --fast-fail - stop on first failure (default: true)
 * @param --continue-on-fail - run all tests even on failure
 * @param --mode=parallel|sequential|auto - execution mode (default: auto)
 * @returns exit code 0 on all pass, 1 on any failure
 * Side effects: writes test evidence JSON, updates history JSONL
 * Errors: falls back to raw bash -c "$test_cmd" on any internal error
 */
function testopt_execute(project_root: string, test_cmd: string, ...opts: string[]): number

Data Flow

Caller invokes testopt_execute "." "$test_cmd" --fast-fail
Core detection: sysctl -n hw.ncpu (macOS) / nproc (Linux) → multiply by 0.75 → clamp to [2, 8]
Init: testopt_init discovers test files, loads history, identifies changed files via git diff
Decision gate: If discovered test count < 3 OR test files are not shell scripts (.sh), fall through to raw bash -c "$test_cmd" — the optimizer adds no value for small suites or non-shell runners (vitest, jest, etc.)
Prioritize: Score by (fail_rate * 100) * 100 - duration, highest-risk tests first
Partition: Scan each test file for shared-state patterns (hardcoded /tmp/ paths without $$, port bind/listen, sqlite3 file locks, global PID files, filesystem singletons). Files matching any pattern → sequential bucket; rest → parallel bucket
Execute parallel bucket via existing testopt_run_parallel --max-workers=$cores
Execute sequential bucket via existing testopt_run_with_fast_fail
If either bucket fails and --fast-fail is set, skip remaining sequential tests
Record results to ~/.shipwright/optimization/test-history.jsonl
Write evidence to $ARTIFACTS_DIR/test-optimizer-evidence.json or $LOG_DIR/test-evidence-iter-${ITERATION}.json
Return aggregate exit code

Error Boundaries

Component	Error Handling	Propagation
`testopt_detect_cores`	Returns default 4 on any failure	Silent — caller gets safe default
`testopt_partition_shared_state`	Treats unreadable files as INDEPENDENT	Silent — worst case runs flaky test in parallel (test itself should handle)
`testopt_execute` (init failure)	Falls back to raw `bash -c "$test_cmd"`	Logs warning, runs unoptimized
`testopt_execute` (parallel failure)	Collects exit codes from `wait`	Propagates as exit code 1
History write failure	`
`stage_test` integration	Catches `testopt_execute` failure, proceeds to dark factory features	Failure propagates normally after holdout/mutation/causal analysis
`run_test_gate` integration	Sets `TEST_PASSED=false` on failure	Same as current behavior

Shared-State Detection Patterns (6 patterns)

# 1. Hardcoded /tmp paths without $$ or mktemp (collision risk)
grep -l '/tmp/[a-zA-Z]' "$file" | grep -v '\$\$\|mktemp'
# 2. Port binding (tests fighting for same port)
grep -l 'bind\|listen\|EADDRINUSE\|:808[0-9]' "$file"
# 3. SQLite file locks (non-WAL concurrent access)
grep -l 'sqlite3.*\.db\|\.sqlite' "$file"
# 4. PID files or lock files
grep -l '\.pid\|\.lock\|flock' "$file"
# 5. Singleton temp directories (shared across tests)
grep -l 'TMPDIR=\|TMP_DIR=' "$file" | grep -v 'mktemp -d'
# 6. Global state mutation (sourcing shared config that sets globals)
grep -l 'source.*config\|\..*config\.sh' "$file"

Configuration

Pipeline config (templates/pipelines/*.json):

{
 "stages": [{
 "id": "test",
 "config": {
 "optimization": "auto",
 "max_workers": 0,
 "fast_fail": true
 }
 }]
}

Environment variables (for loop/ad-hoc):

SW_TEST_OPTIMIZER=true|false|auto — master switch (default: auto, meaning enabled when test-optimizer.sh is available)
SW_TEST_MAX_WORKERS=N — override core detection

Pipeline template discoverability: The optimization field will be present in all templates' test stage config. shipwright templates list (which reads templates/pipelines/*.json) will surface this naturally.

Alternatives Considered

1. Replace test-optimizer.sh with vitest's native parallel mode

Pros: vitest already handles parallelism, worker isolation, and affected-test selection (--changed). Zero new code.
Cons: Only works for JS/TS projects using vitest. Shipwright is polyglot — it orchestrates bash test suites across any language. The 121+ bash test suites cannot use vitest parallelism. Does not solve the pipeline integration problem.

2. GNU parallel as the execution engine

Pros: Battle-tested parallelism, job control, retry, structured output.
Cons: Not installed by default on macOS. Adds an external dependency. The current jobs -r approach in testopt_run_parallel works for the scale we need (2-8 workers, dozens of test files). Over-engineering for the problem size.

3. Full rewrite of run_test_gate/stage_test to always use optimizer

Pros: Simpler code path — one way to run tests everywhere.
Cons: Breaks backwards compatibility. Many users pass custom --test-cmd strings that are not shell scripts (e.g., npm test, cargo test). The optimizer's file-level discovery doesn't apply to these. A facade that falls through to raw execution is safer.

4. Move test optimization into the pipeline composer (intelligence layer)

Pros: Centralized configuration, could use ML-based predictions.
Cons: The composer generates static pipeline JSON at spawn time. Test optimization needs runtime decisions (which files changed THIS iteration, current CPU load). Wrong abstraction layer.

Implementation Plan

Files to modify

scripts/lib/test-optimizer.sh — Add testopt_detect_cores(), testopt_partition_shared_state(), testopt_execute()
scripts/lib/pipeline-stages-build.sh — Wire stage_test() to call testopt_execute when optimization is enabled
scripts/sw-loop.sh — Wire run_test_gate() to call testopt_execute for shell test suites when SW_TEST_OPTIMIZER is set
templates/pipelines/*.json — Add optimization config field to test stage in all 8 templates

Files to create

scripts/sw-test-optimizer-integration-test.sh — Integration tests for the new testopt_execute orchestrator and wiring into pipeline/loop

Dependencies

None new. Uses only sysctl/nproc (already available on target platforms), grep, jq, bash builtins.

Risk areas

Fork exhaustion (HIGH): The historical failures.json documents fork failures under parallel load. Mitigation: hard cap at 8 workers, default 75% of cores (capped at 2 minimum). The while [[ $(jobs -r | wc -l) -ge "$max_workers" ]] throttle in testopt_run_parallel already prevents unbounded forking.
Shared-state false negatives (MEDIUM): The 6 grep patterns may miss custom shared-state patterns (e.g., a test that uses a named pipe or System V semaphore). Mitigation: partition defaults to INDEPENDENT, so worst case a flaky test runs in parallel and fails — the fast-fail catches it, and the next run records the failure in history, increasing its priority score for sequential execution next time. Self-healing via the history system.
Test command fallthrough (LOW): Non-shell test commands (e.g., npm test) must bypass the optimizer cleanly. Mitigation: the decision gate checks if discovered tests are .sh files and if count >= 3 before engaging optimization. Otherwise falls through to raw execution.
Race condition in parallel result file (LOW): Multiple background jobs write to $tmp_results concurrently. Mitigation: the existing >> append is atomic for lines < PIPE_BUF (4096 bytes on all target platforms). Each line is well under that limit.

Validation Criteria

testopt_detect_cores returns integer in [2, 8] on macOS and Linux
testopt_partition_shared_state correctly classifies test files with hardcoded /tmp/foo as SHARED and files using mktemp as INDEPENDENT
testopt_execute with 10+ mock test files runs parallel bucket before sequential bucket, respects --max-workers, and returns correct aggregate exit code
testopt_execute falls back to raw bash -c "$test_cmd" when fewer than 3 shell test files are discovered
stage_test() uses optimizer when pipeline config has "optimization": "auto" and test-optimizer.sh is loaded
stage_test() bypasses optimizer when config says "optimization": "off"
run_test_gate() uses optimizer when SW_TEST_OPTIMIZER=true and test command invokes shell scripts
run_test_gate() preserves fast-test/full-test alternation (FAST_TEST_INTERVAL) — optimizer only applies to the selected command
Dark factory features (holdout, mutation, causal) still execute after optimized test run in stage_test()
SW_TEST_OPTIMIZER=false completely bypasses all optimization in both code paths
All 8 pipeline templates include "optimization": "auto" in test stage config
shipwright templates list surfaces the optimization field (existing behavior — templates list reads JSON)
History JSONL accumulates across runs and influences prioritization order in subsequent executions
No fork exhaustion under parallel execution with max_workers=8 on a standard CI machine

Testing Strategy

Test Pyramid Breakdown

Unit tests (8): Core detection, shared-state partitioning (6 patterns individually), worker clamping, history integration Integration tests (4): testopt_execute end-to-end with mock test files, pipeline stage_test wiring, loop run_test_gate wiring, fallback-to-raw behavior E2E tests (1): Full optimizer flow with real git repo, changed files, affected selection, parallel execution, history recording

Total: 13 new tests in scripts/sw-test-optimizer-integration-test.sh, complementing the existing 8 tests in scripts/sw-test-optimizer-test.sh.

Coverage Targets

testopt_detect_cores: 100% branch coverage (macOS path, Linux path, fallback path)
testopt_partition_shared_state: All 6 patterns tested individually + combined
testopt_execute: Happy path, fallback path, fast-fail path, continue-on-fail path
Pipeline/loop wiring: enabled path, disabled path, missing-module path

Critical Paths to Test

Happy path: 10 mock test files → detect 4 cores → partition 7 independent + 3 shared → parallel bucket runs 4-at-a-time → sequential bucket runs with fast-fail → all pass → exit 0

Error case 1: Parallel test fails → fast-fail stops sequential bucket → exit 1 → history records failure

Error case 2: testopt_init fails (no git repo) → falls back to raw bash -c "$test_cmd" → logs warning → exit code from raw command

Edge case 1: Zero test files discovered → immediate fallback to raw command

Edge case 2: All tests classified as SHARED → no parallel execution → purely sequential with fast-fail

Performance

Baseline Metrics

Full suite: ~1575s (26 min) across 121+ bash test files (from metrics.json, 2026年03月09日)
Current execution: strictly sequential, single process
No parallelism, no affected-test selection, no prioritization in pipeline/loop

Optimization Targets

p50 test stage duration: Reduce from ~1575s to ~600s (2.5x speedup via 4-worker parallelism on typical 4-core CI)
Fast-fail savings: When a test fails early in a 121-test suite, skip remaining tests — expected 50% reduction in failure-case duration
Affected-test selection: On incremental changes touching 1-3 files, run only ~10-20 affected tests instead of 121+ — expected 80-90% reduction

Profiling Strategy

Instrument testopt_execute with date +%s timestamps at each phase (detect, init, partition, parallel, sequential, report)
Write phase timings to evidence JSON for post-mortem analysis
Compare test_duration_s in pipeline events before/after optimization is enabled
The DORA metrics system (sw-dora.sh) already tracks test_duration_s — optimization gains will be visible in dashboards

Benchmark Plan

Before: Run sw-test-optimizer-test.sh + sw-pipeline-test.sh with SW_TEST_OPTIMIZER=false — record total duration
After: Same suites with SW_TEST_OPTIMIZER=true — record total duration
Success criteria: Parallel execution reduces wall-clock time by at least 1.5x on a 4+ core machine; fast-fail reduces failure-case time by at least 30%
Data volume: Test with realistic suite (121+ files), not just unit test fixtures

Pipeline Design 200

Decision

Architecture: Facade Orchestrator Pattern

Component Diagram

Interface Contracts

Data Flow

Error Boundaries

Shared-State Detection Patterns (6 patterns)

Configuration

Alternatives Considered

1. Replace test-optimizer.sh with vitest's native parallel mode

2. GNU parallel as the execution engine

3. Full rewrite of run_test_gate/stage_test to always use optimizer

4. Move test optimization into the pipeline composer (intelligence layer)

Implementation Plan

Files to modify

Files to create

Dependencies

Risk areas

Validation Criteria

Testing Strategy

Test Pyramid Breakdown

Coverage Targets

Critical Paths to Test

Performance

Baseline Metrics

Optimization Targets

Profiling Strategy

Benchmark Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!