Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 200

Seth Ford edited this page Apr 4, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md (291 lines). Key architectural decisions documented:

  1. Static grep-based shared-state detection (6 patterns) over runtime isolation or manual annotation â€" conservative, portable, zero dependencies
  2. Temp file + grep for parallel result propagation â€" solves the bash subshell variable loss problem without IPC
  3. CPU-aware worker cap [2, 8] â€" scales to the machine, doesn't overwhelm constrained systems
  4. Facade with raw bash -c fallback â€" optimizer is purely additive, never worse than baseline

Four alternatives rejected with rationale: Vitest native (wrong layer), GNU parallel (no intelligence), Makefile deps (maintenance cost), Docker isolation (overhead exceeds gains). " disable via SW_TEST_OPTIMIZER=false or optimization: off in pipeline config

  • Must not break existing test correctness â€" parallel tests that share state must be detected and run sequentially
  • History/prioritization must be append-only and self-healing (corrupt JSONL lines skipped on read)

Decision

Introduce a test execution optimizer (scripts/lib/test-optimizer.sh) as a library sourced by both pipeline stages and the loop harness. The optimizer implements a four-phase execution model:

Component Diagram

 â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
 â"‚ Entry Points â"‚
 â"‚ stage_test() â"‚
 â"‚ run_test_gate() â"‚
 â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
 â"‚
 â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
 â"‚ testopt_execute() â"‚
 â"‚ Facade orchestrator â"‚
 â"‚ - parses options â"‚
 â"‚ - gates on <3 tests â"‚
 â"‚ - falls back on init err â"‚
 â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
 â"‚
 â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
 â"‚ â"‚ â"‚
 â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"� â"Œâ"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"� â"Œâ"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"�
 â"‚ Phase 1: Init â"‚ â"‚ Phase 2: â"‚ â"‚ Phase 3: â"‚
 â"‚ testopt_init() â"‚ â"‚ Prioritize â"‚ â"‚ Partition â"‚
 â"‚ - discover â"‚ â"‚ testopt_ â"‚ â"‚ testopt_ â"‚
 â"‚ *-test.sh â"‚ â"‚ prioritize()â"‚ â"‚ partition_ â"‚
 â"‚ - load history â"‚ â"‚ - fail_rate â"‚ â"‚ shared_state() â"‚
 â"‚ - git diff â"‚ â"‚ DESC â"‚ â"‚ - 6 patterns â"‚
 â"‚ - select â"‚ â"‚ - duration â"‚ â"‚ → parallel[] â"‚
 â"‚ affected â"‚ â"‚ ASC â"‚ â"‚ → sequential[] â"‚
 â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â" ̃ â""â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â" ̃ â""â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
 â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
 â"‚
 â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
 â"‚ â"‚
 â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"� â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
 â"‚ Phase 4a: Parallel â"‚ â"‚ Phase 4b: Sequential â"‚
 â"‚ testopt_run_parallelâ"‚ â"‚ testopt_run_with_ â"‚
 â"‚ - N workers (2-8) â"‚ â"‚ fast_fail â"‚
 â"‚ - dir-grouped â"‚ â"‚ - stop on first failure â"‚
 â"‚ - temp-file results â"‚ â"‚ - skip if 4a failed + â"‚
 â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃ â"‚ fast_fail enabled â"‚
 â"‚ â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
 â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"¬â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃
 â"‚
 â"Œâ"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â–1⁄4â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"�
 â"‚ Output â"‚
 â"‚ - history → JSONL append â"‚
 â"‚ - evidence → JSON file â"‚
 â"‚ - events → emit_event() â"‚
 â"‚ - report → stdout â"‚
 â""â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â"€â" ̃

Key Design Decisions

1. Shared-state detection via static grep analysis (not runtime isolation)

  • Context: Need to determine which tests can run in parallel without race conditions.
  • Decision: Grep each test file for 6 patterns indicating shared state: hardcoded /tmp paths, port binding, SQLite files, PID/lock files, singleton TMPDIR assignments, and global config sourcing.
  • Alternatives rejected: (a) Runtime sandboxing with namespaces/cgroups â€" too heavy, not portable to macOS. (b) Manual annotation â€" requires maintainer discipline, gets stale.
  • Consequences: False positives send safe tests to the sequential bucket (slower but correct). False negatives would cause flaky parallel failures. The patterns are intentionally conservative â€" Pattern 6 (config sourcing) captures anything that sources a config file, which over-classifies but prevents subtle global-state mutation bugs.

2. Subshell variable propagation via temp file + grep (not direct variable)

  • Context: Background jobs in testopt_run_parallel() run in subshells. Variable assignments (all_passed=false) don't propagate back to the parent.
  • Decision: Each background job writes "<file> PASS|FAIL <duration>" lines to a shared temp file. Parent checks grep -q ' FAIL ' on the results file.
  • Consequences: Simple, correct, no IPC mechanisms. The temp file may see interleaved writes from concurrent jobs, but each line is a single echo which is atomic for small writes on both Linux and macOS.

3. CPU-aware worker detection with hard cap at 8

  • Context: Need to scale parallelism to the machine without overwhelming constrained systems.
  • Decision: Detect cores via sysctl -n hw.ncpu (Darwin) / /proc/cpuinfo (Linux) / nproc, use 75%, clamp to [2, 8].
  • Consequences: 8-core machine gets 6 workers. 2-core CI runner gets 2 workers. Memory-constrained systems with many cores still get capped at 8.

4. Facade pattern with fallback to raw bash -c

  • Context: The optimizer must never be worse than doing nothing.
  • Decision: testopt_execute() falls back to bash -c "$test_cmd" when: (a) init fails, (b) fewer than 3 test files discovered, or (c) the optimizer is disabled via config.
  • Consequences: Zero risk of regression for edge cases. The optimizer is purely additive.

Data Flow

1. stage_test() reads pipeline config: optimization \!= "off" → calls testopt_execute()
 run_test_gate() reads SW_TEST_OPTIMIZER \!= "false" → calls testopt_execute()
2. testopt_execute(".", "npm test", "--fast-fail")
 â""â"€â"€ testopt_init(".")
 â"œâ"€â"€ find *-test.sh *_test.sh test_*.sh → DISCOVERED_TESTS[]
 â"œâ"€â"€ read ~/.shipwright/optimization/test-history.jsonl → TEST_HISTORY[]
 â"œâ"€â"€ git diff HEAD~1..HEAD → CHANGED_FILES[]
 â""â"€â"€ testopt_select_affected() → AFFECTED_TESTS[] (directory + source matching)
3. testopt_prioritize(AFFECTED_TESTS)
 â""â"€â"€ for each test: score = (fail_rate * 10000) - duration_s
 â""â"€â"€ sort -rn → highest-fail-rate, fastest-duration first
4. testopt_partition_shared_state(prioritized_tests)
 â""â"€â"€ grep each file for 6 patterns → "SHARED:<path>" or "INDEPENDENT:<path>"
 â"œâ"€â"€ INDEPENDENT → parallel_tests[]
 â""â"€â"€ SHARED → sequential_tests[]
5. Phase 4a: testopt_run_parallel(parallel_tests, workers=detect_cores*0.75)
 â""â"€â"€ group tests by directory → background subshells → wait → grep results file
6. Phase 4b: testopt_run_with_fast_fail(sequential_tests)
 â""â"€â"€ skipped entirely if Phase 4a failed AND fast_fail=true
 â""â"€â"€ otherwise runs one-by-one, breaks on first failure
7. Record results → test-history.jsonl (JSONL append)
 Write evidence → $ARTIFACTS_DIR/test-optimizer-evidence.json
 Emit events → testopt.parallel_done, testopt.sequential_done, testopt.fail_fast

Interface Contracts

# Main entry point â€" called by pipeline and loop
testopt_execute <project_root> <test_cmd> [options...]
 Options:
 --max-workers=N # int, override CPU-detected worker count (2-8)
 --fast-fail # bool, stop on first failure (default)
 --continue-on-fail # bool, run all tests despite failures
 --mode=auto|parallel|sequential # execution mode (default: auto)
 Returns: exit 0 (all pass) | exit 1 (any failure)
 Errors: falls back to raw bash -c on init failure
# CPU detection â€" platform-aware core counting
testopt_detect_cores() -> stdout: integer (2-8)
 Errors: returns 4 (safe default) on detection failure
# Shared-state classification â€" static analysis of test files
testopt_partition_shared_state(file...) -> stdout: "SHARED:<path>" | "INDEPENDENT:<path>"
 Errors: non-existent files classified as INDEPENDENT
# Affected test selection â€" git-diff-driven test filtering
testopt_select_affected(changed_files...) -> sets AFFECTED_TESTS global array
 Errors: empty changed_files → returns all discovered tests
# Priority ordering â€" fail-rate weighted sort
testopt_prioritize(tests...) -> stdout: sorted test file paths (one per line)
 Errors: missing history → all tests score equally
# Parallel runner â€" background subshell execution
testopt_run_parallel(--max-workers=N, tests...) -> exit 0|1
 Errors: writes FAIL lines to temp file, checked via grep
# Sequential runner â€" fast-fail execution
testopt_run_with_fast_fail([--continue-on-fail], tests...) -> exit 0|1
 Errors: non-existent test files silently skipped

Error Boundaries

Component Handles Propagation Fallback
testopt_execute() Init failures, <3 tests Returns raw bash -c exit code Transparent fallthrough to original behavior
testopt_init() Missing project root, no git, no history Logs warning, continues with empty arrays All tests treated as affected, no prioritization
testopt_run_parallel() Subshell crashes, missing test files Writes FAIL to results temp file grep -q ' FAIL ' on results file catches all
testopt_run_with_fast_fail() Test exit code != 0 Breaks loop (fast-fail) or continues (flag) Returns 1 with failed test name on stdout
testopt_record_history() Write failures, missing directory Suppressed via 2>/dev/null || true Missing history = no prioritization (graceful)
testopt_partition_shared_state() Non-existent files Classified as INDEPENDENT Conservative â€" false positives go sequential

Alternatives Considered

1. Vitest Native Parallelism (Node-Level)

  • Pros: Vitest already supports --pool threads/forks, parallel file execution, and --reporter for structured output. Would work natively with npm test.
  • Cons: This project's test suite is 102+ bash test scripts, not Vitest test files. The npm test command dispatches to these bash scripts. Vitest parallelism would only help if tests were .ts/.js files. Doesn't address affected-first or shared-state detection for bash scripts.
  • Why rejected: Wrong layer. The optimization target is bash-level test file execution, not Node-level test runner internals.

2. GNU Parallel / xargs -P

  • Pros: Battle-tested parallel execution. Simple: find *-test.sh | parallel -j$(nproc) bash {}.
  • Cons: No shared-state detection â€" would run all tests in parallel including those that fight over ports/files. No affected-first prioritization. No fast-fail cascade between parallel and sequential phases. Adds a dependency (parallel not installed by default on macOS).
  • Why rejected: Lacks the intelligence layer (prioritization, partitioning, history). Would need the same detection code wrapped around it anyway.

3. Makefile-Based Dependency Graph

  • Pros: Explicit dependency declaration between test files. make -j handles parallelism natively.
  • Cons: Requires maintaining a Makefile with test dependencies â€" high maintenance burden. Every new test file needs a rule. No automatic shared-state detection. Foreign to the existing bash-centric architecture.
  • Why rejected: Maintenance cost too high for 102+ test files. Static declaration gets stale.

4. Container-Based Isolation (Docker per test)

  • Pros: Perfect isolation â€" no shared-state concerns. Every test gets a clean filesystem.
  • Cons: Container startup overhead (~2-5s per test) would negate parallelism gains. 102 containers would require significant memory. Not available in all CI environments. Massive complexity increase.
  • Why rejected: Overhead exceeds the time savings from parallelism.

Implementation Plan

Files Created

File Purpose
scripts/lib/test-optimizer.sh (741 lines) Core library: detection, partitioning, prioritization, execution, history, reporting
scripts/sw-test-optimizer-integration-test.sh (438 lines) 25 integration tests covering all new functions

Files Modified

File Change
scripts/lib/pipeline-stages-build.sh:569-584 stage_test() reads optimization from pipeline config, calls testopt_execute() when not off
scripts/sw-loop.sh:997-1000 run_test_gate() checks SW_TEST_OPTIMIZER, calls testopt_execute() when not false
templates/pipelines/*.json (9 files) Added "optimization": "auto", "fast_fail": true to test stage config

Dependencies

  • None new. Uses only: bash, jq, grep, find, sort, awk, mktemp, sysctl/nproc, git

Risk Areas

  1. Shared-state false negatives â€" Pattern 6 (config sourcing) is broad but not exhaustive. A test could share state via an unconventional mechanism (e.g., writing to a well-known path without matching any of the 6 patterns). Mitigation: --mode=sequential override and SW_TEST_OPTIMIZER=false kill switch.
  2. Concurrent JSONL writes â€" Multiple parallel pipelines (worktrees) appending to ~/.shipwright/optimization/test-history.jsonl simultaneously. Small single-line appends are atomic on Linux/macOS for typical filesystem block sizes, but not guaranteed. Mitigation: JSONL format is self-healing â€" corrupt lines are skipped on read.
  3. History file unbounded growth â€" test-history.jsonl grows indefinitely. For 102 tests * 10 runs/day * 365 days = ~372K lines. Mitigation: not yet implemented. Future work: add rotation or tail-N windowing.

Validation Criteria

  • All 9 pipeline templates contain "optimization": "auto" and "fast_fail": true in test stage config
  • stage_test() calls testopt_execute() when optimization \!= "off" â€" at pipeline-stages-build.sh:576-581
  • run_test_gate() calls testopt_execute() when SW_TEST_OPTIMIZER \!= "false" â€" at sw-loop.sh:997-1000
  • Backwards compatible: SW_TEST_OPTIMIZER=false bypasses optimizer in loop
  • Backwards compatible: optimization: "off" bypasses optimizer in pipeline
  • Fallback on <3 test files â€" at test-optimizer.sh:621
  • Fallback on init failure â€" at test-optimizer.sh:614-618
  • CPU detection clamped to [2, 8] â€" at test-optimizer.sh:517-518
  • 6 shared-state patterns implemented â€" at test-optimizer.sh:537-567
  • Parallel runner uses temp-file + grep for result propagation â€" at test-optimizer.sh:382-413
  • Evidence JSON written to $ARTIFACTS_DIR/test-optimizer-evidence.json
  • Events emitted: testopt.parallel_done, testopt.sequential_done, testopt.fail_fast, testopt.recorded
  • All 25 integration tests pass
  • All 20 existing unit tests pass
  • Pipeline template config discoverable via shipwright templates list

Test Pyramid Breakdown

Layer Count Coverage Target What's Tested
Unit 20 (sw-test-optimizer-test.sh) Core library functions Discovery, history load/query, affected selection, prioritization sort, fast-fail, parallel execution
Integration 25 (sw-test-optimizer-integration-test.sh) New functions + wiring testopt_detect_cores, testopt_partition_shared_state, testopt_execute orchestrator, stage_test() wiring, run_test_gate() wiring
E2E 0 (covered by existing npm test) Full pipeline flow Optimizer activates during normal npm test runs â€" 102+ suites exercise the real path

Coverage targets: 100% of public functions have direct tests. Error paths (missing root, <3 tests, empty history) explicitly tested. Edge cases (all-shared, all-independent, single file) covered.

Critical Paths Tested

Happy path: 5 independent + 2 shared-state tests → parallel phase runs first with N workers → sequential phase runs second → all pass → exit 0, evidence JSON written.

Error cases:

  1. Failing test in parallel bucket + fast-fail → sequential bucket skipped entirely → exit 1 with testopt.fail_fast event emitted
  2. Failing test with --continue-on-fail → all tests run regardless → exit 1 with full results

Edge cases:

  1. Zero test files discovered → falls through to bash -c "$test_cmd" (no optimizer overhead)
  2. All tests classified as SHARED → parallel bucket empty, sequential bucket gets everything → behaves like original sequential execution

Baseline Metrics

Metric Current Value Source
Full test suite wall-clock ~1365s memory/metrics.json (2026年04月04日 baseline)
Execution mode Sequential only Single bash -c "$test_cmd"
Time to first failure Up to ~1365s (worst case) No early termination

Optimization Targets

Metric Target Rationale
Full suite wall-clock <900s (34% reduction) Parallelism across ~75% of tests classified as independent
Time to first failure <200s (70%+ reduction) Affected-first prioritization + fast-fail stops early
Test correctness Zero regressions Shared-state partitioning prevents parallel flakes

Profiling Strategy

  • Wall-clock per phase: testopt.parallel_done and testopt.sequential_done events capture duration per phase
  • Evidence JSON: test-optimizer-evidence.json records total/parallel/sequential counts, workers, mode, exit code per run
  • Historical trending: test-history.jsonl accumulates per-test duration and pass/fail data across runs
  • Dashboard integration: /api/metrics/stage-performance surfaces test stage duration trends from event data

Benchmark Plan

Step Method Success Criteria
Before time npm test on main branch Record wall-clock baseline (~1365s)
After time npm test on feature branch Wall-clock < 900s
Verify Read test-optimizer-evidence.json parallel_tests > 0, workers > 1, exit_code: 0
Regression check Compare test pass counts Same number of PASS/FAIL as main branch

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /