Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 200

Seth Ford edited this page Apr 4, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Key design decisions:

  1. Facade Orchestrator — Single new testopt_execute() function as the sole integration point, called by both stage_test() and run_test_gate(). Falls back to raw bash -c when the optimizer can't help (non-shell tests, tiny suites).

  2. Shared-state partitioning — 6 grep patterns classify tests as SHARED (sequential) or INDEPENDENT (parallel). False negatives self-heal via the history system.

  3. Fork exhaustion prevention — 75% of cores, hard capped at [2, 8], using the existing jobs -r throttle.

  4. Full backwards compatibilitySW_TEST_OPTIMIZER=false bypasses everything. Non-shell test commands pass through untouched.

  5. Three new functions in test-optimizer.sh: testopt_detect_cores(), testopt_partition_shared_state(), testopt_execute(). Five files modified total, one new test file created.

Expected gains: ~2.5x speedup via parallelism, 50% reduction in failure-case duration via fast-fail, 80-90% reduction for incremental changes via affected-test selection. -test/full-test alternation (FAST_TEST_INTERVAL) must be preserved

  • Parallel test execution must avoid fork-bombing (historical failure pattern)

Decision

Architecture: Facade Orchestrator Pattern

Add a single new function testopt_execute() to test-optimizer.sh that serves as the sole integration point. Both stage_test() and run_test_gate() call this one function instead of raw bash -c. The orchestrator internally sequences: init → discover → select-affected → prioritize → partition (shared-state vs independent) → execute (parallel for independent, sequential for shared-state) → record history → report.

Component Diagram

┌─────────────────────────────────────────────────────────────────┐
│ Callers │
│ ┌──────────────────────┐ ┌────────────────────────────────┐ │
│ │ stage_test() │ │ run_test_gate() │ │
│ │ pipeline-stages- │ │ sw-loop.sh:961 │ │
│ │ build.sh:546 │ │ │ │
│ └──────────┬───────────┘ └──────────────┬─────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ testopt_execute() [NEW] │ │
│ │ Entry point: project_root, test_cmd, options │ │
│ └──────────┬──────────┬──────────┬──────────┬──────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────┐ ┌────────┐ ┌───────┐ ┌────────────────────────┐ │
│ │ detect │ │ select │ │ prio- │ │ partition & execute │ │
│ │ cores │ │ affect-│ │ ritize│ │ ┌──────┐ ┌───────────┐ │ │
│ │ [NEW] │ │ ed │ │ │ │ │parall│ │sequential │ │ │
│ └─────────┘ └────────┘ └───────┘ │ │ (ind)│ │(shared st)│ │ │
│ │ └──────┘ └───────────┘ │ │
│ └────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ record_history + │ │
│ │ report + evidence │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Interface Contracts

// New functions added to test-optimizer.sh
/**
 * Detect available CPU cores, return 75% (capped 2-8)
 * @returns number via stdout
 * Errors: returns 4 (safe default) on detection failure
 */
function testopt_detect_cores(): number // stdout: integer
/**
 * Scan test files for shared-state indicators
 * @param test_files - space-separated list of test file paths
 * @returns lines of "SHARED:<file>" or "INDEPENDENT:<file>" to stdout
 * Errors: treats scan failure as INDEPENDENT (safe default)
 */
function testopt_partition_shared_state(test_files: string[]): string[] // stdout
/**
 * Single entry point for optimized test execution
 * @param project_root - path to project root
 * @param test_cmd - fallback command (used if no shell test files found)
 * @param --max-workers=N - override core detection (optional)
 * @param --fast-fail - stop on first failure (default: true)
 * @param --continue-on-fail - run all tests even on failure
 * @param --mode=parallel|sequential|auto - execution mode (default: auto)
 * @returns exit code 0 on all pass, 1 on any failure
 * Side effects: writes test evidence JSON, updates history JSONL
 * Errors: falls back to raw bash -c "$test_cmd" on any internal error
 */
function testopt_execute(project_root: string, test_cmd: string, ...opts: string[]): number

Data Flow

  1. Caller invokes testopt_execute "." "$test_cmd" --fast-fail
  2. Core detection: sysctl -n hw.ncpu (macOS) / nproc (Linux) → multiply by 0.75 → clamp to [2, 8]
  3. Init: testopt_init discovers test files, loads history, identifies changed files via git diff
  4. Decision gate: If discovered test count < 3 OR test files are not shell scripts (.sh), fall through to raw bash -c "$test_cmd" — the optimizer adds no value for small suites or non-shell runners (vitest, jest, etc.)
  5. Prioritize: Score by (fail_rate * 100) * 100 - duration, highest-risk tests first
  6. Partition: Scan each test file for shared-state patterns (hardcoded /tmp/ paths without $$, port bind/listen, sqlite3 file locks, global PID files, filesystem singletons). Files matching any pattern → sequential bucket; rest → parallel bucket
  7. Execute parallel bucket via existing testopt_run_parallel --max-workers=$cores
  8. Execute sequential bucket via existing testopt_run_with_fast_fail
  9. If either bucket fails and --fast-fail is set, skip remaining sequential tests
  10. Record results to ~/.shipwright/optimization/test-history.jsonl
  11. Write evidence to $ARTIFACTS_DIR/test-optimizer-evidence.json or $LOG_DIR/test-evidence-iter-${ITERATION}.json
  12. Return aggregate exit code

Error Boundaries

Component Error Handling Propagation
testopt_detect_cores Returns default 4 on any failure Silent — caller gets safe default
testopt_partition_shared_state Treats unreadable files as INDEPENDENT Silent — worst case runs flaky test in parallel (test itself should handle)
testopt_execute (init failure) Falls back to raw bash -c "$test_cmd" Logs warning, runs unoptimized
testopt_execute (parallel failure) Collects exit codes from wait Propagates as exit code 1
History write failure `
stage_test integration Catches testopt_execute failure, proceeds to dark factory features Failure propagates normally after holdout/mutation/causal analysis
run_test_gate integration Sets TEST_PASSED=false on failure Same as current behavior

Shared-State Detection Patterns (6 patterns)

# 1. Hardcoded /tmp paths without $$ or mktemp (collision risk)
grep -l '/tmp/[a-zA-Z]' "$file" | grep -v '\$\$\|mktemp'
# 2. Port binding (tests fighting for same port)
grep -l 'bind\|listen\|EADDRINUSE\|:808[0-9]' "$file"
# 3. SQLite file locks (non-WAL concurrent access)
grep -l 'sqlite3.*\.db\|\.sqlite' "$file"
# 4. PID files or lock files
grep -l '\.pid\|\.lock\|flock' "$file"
# 5. Singleton temp directories (shared across tests)
grep -l 'TMPDIR=\|TMP_DIR=' "$file" | grep -v 'mktemp -d'
# 6. Global state mutation (sourcing shared config that sets globals)
grep -l 'source.*config\|\..*config\.sh' "$file"

Configuration

Pipeline config (templates/pipelines/*.json):

{
 "stages": [{
 "id": "test",
 "config": {
 "optimization": "auto",
 "max_workers": 0,
 "fast_fail": true
 }
 }]
}

Environment variables (for loop/ad-hoc):

  • SW_TEST_OPTIMIZER=true|false|auto — master switch (default: auto, meaning enabled when test-optimizer.sh is available)
  • SW_TEST_MAX_WORKERS=N — override core detection

Pipeline template discoverability: The optimization field will be present in all templates' test stage config. shipwright templates list (which reads templates/pipelines/*.json) will surface this naturally.

Alternatives Considered

1. Replace test-optimizer.sh with vitest's native parallel mode

  • Pros: vitest already handles parallelism, worker isolation, and affected-test selection (--changed). Zero new code.
  • Cons: Only works for JS/TS projects using vitest. Shipwright is polyglot — it orchestrates bash test suites across any language. The 121+ bash test suites cannot use vitest parallelism. Does not solve the pipeline integration problem.

2. GNU parallel as the execution engine

  • Pros: Battle-tested parallelism, job control, retry, structured output.
  • Cons: Not installed by default on macOS. Adds an external dependency. The current jobs -r approach in testopt_run_parallel works for the scale we need (2-8 workers, dozens of test files). Over-engineering for the problem size.

3. Full rewrite of run_test_gate/stage_test to always use optimizer

  • Pros: Simpler code path — one way to run tests everywhere.
  • Cons: Breaks backwards compatibility. Many users pass custom --test-cmd strings that are not shell scripts (e.g., npm test, cargo test). The optimizer's file-level discovery doesn't apply to these. A facade that falls through to raw execution is safer.

4. Move test optimization into the pipeline composer (intelligence layer)

  • Pros: Centralized configuration, could use ML-based predictions.
  • Cons: The composer generates static pipeline JSON at spawn time. Test optimization needs runtime decisions (which files changed THIS iteration, current CPU load). Wrong abstraction layer.

Implementation Plan

Files to modify

  • scripts/lib/test-optimizer.sh — Add testopt_detect_cores(), testopt_partition_shared_state(), testopt_execute()
  • scripts/lib/pipeline-stages-build.sh — Wire stage_test() to call testopt_execute when optimization is enabled
  • scripts/sw-loop.sh — Wire run_test_gate() to call testopt_execute for shell test suites when SW_TEST_OPTIMIZER is set
  • templates/pipelines/*.json — Add optimization config field to test stage in all 8 templates

Files to create

  • scripts/sw-test-optimizer-integration-test.sh — Integration tests for the new testopt_execute orchestrator and wiring into pipeline/loop

Dependencies

  • None new. Uses only sysctl/nproc (already available on target platforms), grep, jq, bash builtins.

Risk areas

  1. Fork exhaustion (HIGH): The historical failures.json documents fork failures under parallel load. Mitigation: hard cap at 8 workers, default 75% of cores (capped at 2 minimum). The while [[ $(jobs -r | wc -l) -ge "$max_workers" ]] throttle in testopt_run_parallel already prevents unbounded forking.
  2. Shared-state false negatives (MEDIUM): The 6 grep patterns may miss custom shared-state patterns (e.g., a test that uses a named pipe or System V semaphore). Mitigation: partition defaults to INDEPENDENT, so worst case a flaky test runs in parallel and fails — the fast-fail catches it, and the next run records the failure in history, increasing its priority score for sequential execution next time. Self-healing via the history system.
  3. Test command fallthrough (LOW): Non-shell test commands (e.g., npm test) must bypass the optimizer cleanly. Mitigation: the decision gate checks if discovered tests are .sh files and if count >= 3 before engaging optimization. Otherwise falls through to raw execution.
  4. Race condition in parallel result file (LOW): Multiple background jobs write to $tmp_results concurrently. Mitigation: the existing >> append is atomic for lines < PIPE_BUF (4096 bytes on all target platforms). Each line is well under that limit.

Validation Criteria

  • testopt_detect_cores returns integer in [2, 8] on macOS and Linux
  • testopt_partition_shared_state correctly classifies test files with hardcoded /tmp/foo as SHARED and files using mktemp as INDEPENDENT
  • testopt_execute with 10+ mock test files runs parallel bucket before sequential bucket, respects --max-workers, and returns correct aggregate exit code
  • testopt_execute falls back to raw bash -c "$test_cmd" when fewer than 3 shell test files are discovered
  • stage_test() uses optimizer when pipeline config has "optimization": "auto" and test-optimizer.sh is loaded
  • stage_test() bypasses optimizer when config says "optimization": "off"
  • run_test_gate() uses optimizer when SW_TEST_OPTIMIZER=true and test command invokes shell scripts
  • run_test_gate() preserves fast-test/full-test alternation (FAST_TEST_INTERVAL) — optimizer only applies to the selected command
  • Dark factory features (holdout, mutation, causal) still execute after optimized test run in stage_test()
  • SW_TEST_OPTIMIZER=false completely bypasses all optimization in both code paths
  • All 8 pipeline templates include "optimization": "auto" in test stage config
  • shipwright templates list surfaces the optimization field (existing behavior — templates list reads JSON)
  • History JSONL accumulates across runs and influences prioritization order in subsequent executions
  • No fork exhaustion under parallel execution with max_workers=8 on a standard CI machine

Testing Strategy

Test Pyramid Breakdown

Unit tests (8): Core detection, shared-state partitioning (6 patterns individually), worker clamping, history integration Integration tests (4): testopt_execute end-to-end with mock test files, pipeline stage_test wiring, loop run_test_gate wiring, fallback-to-raw behavior E2E tests (1): Full optimizer flow with real git repo, changed files, affected selection, parallel execution, history recording

Total: 13 new tests in scripts/sw-test-optimizer-integration-test.sh, complementing the existing 8 tests in scripts/sw-test-optimizer-test.sh.

Coverage Targets

  • testopt_detect_cores: 100% branch coverage (macOS path, Linux path, fallback path)
  • testopt_partition_shared_state: All 6 patterns tested individually + combined
  • testopt_execute: Happy path, fallback path, fast-fail path, continue-on-fail path
  • Pipeline/loop wiring: enabled path, disabled path, missing-module path

Critical Paths to Test

Happy path: 10 mock test files → detect 4 cores → partition 7 independent + 3 shared → parallel bucket runs 4-at-a-time → sequential bucket runs with fast-fail → all pass → exit 0

Error case 1: Parallel test fails → fast-fail stops sequential bucket → exit 1 → history records failure

Error case 2: testopt_init fails (no git repo) → falls back to raw bash -c "$test_cmd" → logs warning → exit code from raw command

Edge case 1: Zero test files discovered → immediate fallback to raw command

Edge case 2: All tests classified as SHARED → no parallel execution → purely sequential with fast-fail


Performance

Baseline Metrics

  • Full suite: ~1575s (26 min) across 121+ bash test files (from metrics.json, 2026年03月09日)
  • Current execution: strictly sequential, single process
  • No parallelism, no affected-test selection, no prioritization in pipeline/loop

Optimization Targets

  • p50 test stage duration: Reduce from ~1575s to ~600s (2.5x speedup via 4-worker parallelism on typical 4-core CI)
  • Fast-fail savings: When a test fails early in a 121-test suite, skip remaining tests — expected 50% reduction in failure-case duration
  • Affected-test selection: On incremental changes touching 1-3 files, run only ~10-20 affected tests instead of 121+ — expected 80-90% reduction

Profiling Strategy

  • Instrument testopt_execute with date +%s timestamps at each phase (detect, init, partition, parallel, sequential, report)
  • Write phase timings to evidence JSON for post-mortem analysis
  • Compare test_duration_s in pipeline events before/after optimization is enabled
  • The DORA metrics system (sw-dora.sh) already tracks test_duration_s — optimization gains will be visible in dashboards

Benchmark Plan

  • Before: Run sw-test-optimizer-test.sh + sw-pipeline-test.sh with SW_TEST_OPTIMIZER=false — record total duration
  • After: Same suites with SW_TEST_OPTIMIZER=true — record total duration
  • Success criteria: Parallel execution reduces wall-clock time by at least 1.5x on a 4+ core machine; fast-fail reduces failure-case time by at least 30%
  • Data volume: Test with realistic suite (121+ files), not just unit test fixtures

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /