-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Design 172
Now I have enough context. Here's the ADR:
sw-pipeline.sh was a 3,171-line monolith handling CLI dispatch, state management, stage execution, orchestration, self-healing loops, and GitHub integration. This made it difficult to test individual components, led to implicit coupling between state mutations and execution logic, and created a maintenance burden where any change risked cascading breakage.
Constraints:
- Bash 3.2 compatibility (no associative arrays, no
readarray) -
set -euo pipefailacross all scripts - Existing test framework (
lib/test-helpers.sh) with PASS/FAIL counters — no bats-core - State persisted as YAML frontmatter in
.claude/pipeline-state.md - Stage status tracked via newline-delimited
key:valuestrings in shell variables (not JSON) - 17 pipeline library modules already exist under
scripts/lib/
Current state: The extraction is complete (commit 1d85a6a). sw-pipeline.sh is 708 lines. The two target modules exist and are functional. What remains is closing test coverage gaps.
┌─────────────────────────────────────────────────────────┐
│ sw-pipeline.sh (708 lines) │
│ Responsibility: CLI dispatch, signal setup, sourcing │
│ │
│ ┌─────────────────┐ ┌────────────────────────────┐ │
│ │ pipeline- │ │ pipeline-orchestration.sh │ │
│ │ commands.sh │→ │ (stage sequencing, gates) │ │
│ │ (CLI verbs) │ └────────────┬───────────────┘ │
│ └─────────────────┘ │ │
│ ▼ │
│ ┌────────────────────────────────────┐ │
│ │ pipeline-stage-executor.sh (645 ln) │ │
│ │ Error classify, retry, self-healing │ │
│ └────────────────┬───────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────┐ │
│ │ pipeline-state.sh (612 ln) │ │
│ │ State CRUD, persistence, artifacts │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Dependency direction: orchestration → executor → state (inward only, no cycles)
Both modules use include guards to prevent double-sourcing and ensure they can be loaded independently without side effects:
[[ -n "${_PIPELINE_STATE_LOADED:-}" ]] && return 0 _PIPELINE_STATE_LOADED=1
All required variables have safe defaults via ${VAR:-default} so modules can be sourced in test harnesses without the full sw-pipeline.sh environment.
State is not JSON — it uses newline-delimited key:value strings in shell variables (STAGE_STATUSES, STAGE_TIMINGS, LOG_ENTRIES). This avoids jq round-trips for hot-path operations (status lookups happen per-stage). Persistence to disk uses YAML frontmatter in STATE_FILE via write_state().
Atomic writes: write_state() writes to a temp file then mv to prevent corruption on interrupt.
classify_error() in the executor uses a three-tier approach:
-
Cache lookup — check
~/.shipwright/optimization/error-classifications.jsonfor previously-seen error signatures (cksum-based) - Pattern matching — grep log tail for known patterns (infrastructure: timeout/OOM/network; configuration: MODULE_NOT_FOUND/ENOENT; logic: assertion/TypeError)
- Fallback — "unknown" classification
Classification determines retry behavior: infrastructure errors retry, configuration errors fail immediately, logic errors fail after repeated occurrence.
self_healing_build_test() (287 lines) implements convergence detection:
- Same error 3x → early exit (stuck)
- Same failure count 2x → early exit (plateau)
- Infrastructure error → immediate exit (not code-fixable)
This is the highest-complexity function and the primary test coverage gap.
pipeline-state.sh public API:
save_artifact(name, content) → void # writes to ARTIFACTS_DIR
get_stage_status(stage_id) → string # "pending"|"running"|"complete"|"failed"|""
set_stage_status(stage_id, status) → void
record_stage_start(stage_id) → void
record_stage_end(stage_id) → void
get_stage_timing(stage_id) → string # "1m30s"
get_stage_timing_seconds(stage_id) → int # 0 if unknown
get_slowest_stage() → string # stage_id or ""
get_stage_description(stage_id) → string
build_stage_progress() → string # "intake:complete plan:running"
update_status(status, stage) → void # sets PIPELINE_STATUS + CURRENT_STAGE, calls write_state
mark_stage_complete(stage_id) → void # side effects: event, timing, GitHub, checkpoint
mark_stage_failed(stage_id) → void # side effects: event, error comment, GitHub
initialize_state() → void # resets all state
write_state() → void | return 1 # disk space check, YAML frontmatter to STATE_FILE
resume_state() → void | exit 1 # restores from STATE_FILE; exits on missing file/goal
verify_stage_artifacts(stage_id) → 0|1 # checks expected artifacts exist
pipeline-stage-executor.sh public API:
classify_error(stage_id) → "infrastructure"|"configuration"|"logic"|"unknown"
run_stage_with_retry(stage_id) → 0|1 # handles retry logic per classification
self_healing_build_test() → 0|1 # build+test loop with convergence detection
self_healing_review_build_test() → 0|1 # review+build+test loop
| Component | Handles locally | Propagates upward |
|---|---|---|
| state | Disk space (write_state), missing artifacts (verify), CI push failures (persist_artifacts) | Missing state file → exit 1
|
| executor | Error classification, convergence detection, retry decisions | Stage failure → return 1
|
| orchestration | Stage sequencing, gate enforcement, self-healing loop entry | Pipeline failure → writes final state, exits |
CLI args → parse_args → pipeline_start()
→ initialize_state() [state]
→ run_pipeline() [orchestration]
→ for each enabled stage:
→ record_stage_start(id) [state]
→ run_stage_with_retry(id) [executor]
→ stage_<id>() [stage modules]
→ classify_error(id) on failure [executor]
→ mark_stage_complete(id) [state → write_state → STATE_FILE]
or mark_stage_failed(id) [state → emit_event → events.jsonl]
[state → gh_update_progress → GitHub]
-
Further decomposition (split large functions into sub-modules) — Pros: smaller units, easier to test in isolation. Cons: more files to source, higher bash overhead, more complex dependency graph. Rejected: current boundaries are clean and match single-responsibility. Adding tests for existing functions is lower-risk than restructuring.
-
Rewrite tests using bats-core — Pros: proper TAP output, better assertion library, setup/teardown hooks. Cons: new dependency, requires rewriting 9 existing test files, team learning curve. Rejected: existing
test-helpers.shframework works well and consistency across the test suite matters more. -
Test only public API, skip internal functions — Pros: fewer tests to write, less maintenance. Cons: won't reach >80% coverage on complex functions like
mark_stage_complete(94 lines with many side effects) andself_healing_build_test(287 lines). Rejected: these functions have complex branching that needs verification. -
Convert state to JSON throughout — Pros: structured data, easier validation. Cons: jq round-trips on every status lookup (hot path), Bash 3.2 compatibility concerns, large diff touching all consumers. Rejected: newline-delimited format is fast and sufficient; persistence layer already handles serialization.
- Files to create: None
-
Files to modify:
-
scripts/sw-lib-pipeline-state-test.sh— add ~15 tests forwrite_state,resume_state,get_slowest_stage,build_stage_progress,mark_stage_complete,mark_stage_failed,update_status -
scripts/sw-lib-pipeline-stage-executor-test.sh— add ~10 tests forself_healing_build_test(happy path, convergence stuck, plateau detection),run_stage_with_retryedge cases (plan artifact skip, configuration error escalation)
-
- Dependencies: None new
-
Risk areas:
-
mark_stage_complete/failedtests require many stubs (emit_event, gh_*, checkpoint) — mitigated by reusing existing stub pattern already in the test file -
self_healing_build_testcontainssleepcalls — override sleep in tests, mock stage functions to fail/succeed deterministically -
write_statetest must not corrupt real state — tests already useTEST_TEMP_DIRisolation -
resume_stateneeds a valid YAML state file — generate content in test using known-good format fromwrite_state
-
-
sw-pipeline.shremains < 1,500 lines (currently 708 — already met) -
pipeline-state.shexists as separate module with include guard (already met, 612 lines) -
pipeline-stage-executor.shexists as separate module with include guard (already met, 645 lines) - All existing tests pass (
npm testgreen, no regressions) - 20+ new unit tests added across both test files
- Function coverage >80% for
pipeline-state.sh(from ~60% → ~95%) - Line coverage >80% for
pipeline-stage-executor.sh(from ~40% → ~80%) - Each module can be sourced independently without side effects (include guards + safe defaults verified)
- No production code changes — test-only additions
Not applicable — these are shell modules with function-call interfaces, not HTTP APIs. The interface contracts above define the function signatures and error behavior.