Pipeline Design 172

ezigus edited this page Mar 17, 2026 · 1 revision

Now I have enough context. Here's the ADR:

Design: sw-pipeline.sh modular extraction — stage executor and state manager

Context

sw-pipeline.sh was a 3,171-line monolith handling CLI dispatch, state management, stage execution, orchestration, self-healing loops, and GitHub integration. This made it difficult to test individual components, led to implicit coupling between state mutations and execution logic, and created a maintenance burden where any change risked cascading breakage.

Constraints:

Bash 3.2 compatibility (no associative arrays, no readarray)
set -euo pipefail across all scripts
Existing test framework (lib/test-helpers.sh) with PASS/FAIL counters — no bats-core
State persisted as YAML frontmatter in .claude/pipeline-state.md
Stage status tracked via newline-delimited key:value strings in shell variables (not JSON)
17 pipeline library modules already exist under scripts/lib/

Current state: The extraction is complete (commit 1d85a6a). sw-pipeline.sh is 708 lines. The two target modules exist and are functional. What remains is closing test coverage gaps.

Decision

Module Boundaries (3 layers, strict dependency direction)

┌─────────────────────────────────────────────────────────┐
│ sw-pipeline.sh (708 lines) │
│ Responsibility: CLI dispatch, signal setup, sourcing │
│ │
│ ┌─────────────────┐ ┌────────────────────────────┐ │
│ │ pipeline- │ │ pipeline-orchestration.sh │ │
│ │ commands.sh │→ │ (stage sequencing, gates) │ │
│ │ (CLI verbs) │ └────────────┬───────────────┘ │
│ └─────────────────┘ │ │
│ ▼ │
│ ┌────────────────────────────────────┐ │
│ │ pipeline-stage-executor.sh (645 ln) │ │
│ │ Error classify, retry, self-healing │ │
│ └────────────────┬───────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────┐ │
│ │ pipeline-state.sh (612 ln) │ │
│ │ State CRUD, persistence, artifacts │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Dependency direction: orchestration → executor → state (inward only, no cycles)

Include Guards

Both modules use include guards to prevent double-sourcing and ensure they can be loaded independently without side effects:

[[ -n "${_PIPELINE_STATE_LOADED:-}" ]] && return 0
_PIPELINE_STATE_LOADED=1

All required variables have safe defaults via ${VAR:-default} so modules can be sourced in test harnesses without the full sw-pipeline.sh environment.

State Representation

State is not JSON — it uses newline-delimited key:value strings in shell variables (STAGE_STATUSES, STAGE_TIMINGS, LOG_ENTRIES). This avoids jq round-trips for hot-path operations (status lookups happen per-stage). Persistence to disk uses YAML frontmatter in STATE_FILE via write_state().

Atomic writes: write_state() writes to a temp file then mv to prevent corruption on interrupt.

Error Classification Strategy

classify_error() in the executor uses a three-tier approach:

Cache lookup — check ~/.shipwright/optimization/error-classifications.json for previously-seen error signatures (cksum-based)
Pattern matching — grep log tail for known patterns (infrastructure: timeout/OOM/network; configuration: MODULE_NOT_FOUND/ENOENT; logic: assertion/TypeError)
Fallback — "unknown" classification

Classification determines retry behavior: infrastructure errors retry, configuration errors fail immediately, logic errors fail after repeated occurrence.

Self-Healing Loops

self_healing_build_test() (287 lines) implements convergence detection:

Same error 3x → early exit (stuck)
Same failure count 2x → early exit (plateau)
Infrastructure error → immediate exit (not code-fixable)

This is the highest-complexity function and the primary test coverage gap.

Interface Contracts

pipeline-state.sh public API:

save_artifact(name, content) → void # writes to ARTIFACTS_DIR
get_stage_status(stage_id) → string # "pending"|"running"|"complete"|"failed"|""
set_stage_status(stage_id, status) → void
record_stage_start(stage_id) → void
record_stage_end(stage_id) → void
get_stage_timing(stage_id) → string # "1m30s"
get_stage_timing_seconds(stage_id) → int # 0 if unknown
get_slowest_stage() → string # stage_id or ""
get_stage_description(stage_id) → string
build_stage_progress() → string # "intake:complete plan:running"
update_status(status, stage) → void # sets PIPELINE_STATUS + CURRENT_STAGE, calls write_state
mark_stage_complete(stage_id) → void # side effects: event, timing, GitHub, checkpoint
mark_stage_failed(stage_id) → void # side effects: event, error comment, GitHub
initialize_state() → void # resets all state
write_state() → void | return 1 # disk space check, YAML frontmatter to STATE_FILE
resume_state() → void | exit 1 # restores from STATE_FILE; exits on missing file/goal
verify_stage_artifacts(stage_id) → 0|1 # checks expected artifacts exist

pipeline-stage-executor.sh public API:

classify_error(stage_id) → "infrastructure"|"configuration"|"logic"|"unknown"
run_stage_with_retry(stage_id) → 0|1 # handles retry logic per classification
self_healing_build_test() → 0|1 # build+test loop with convergence detection
self_healing_review_build_test() → 0|1 # review+build+test loop

Error Boundaries

Component	Handles locally	Propagates upward
state	Disk space (write_state), missing artifacts (verify), CI push failures (persist_artifacts)	Missing state file → `exit 1`
executor	Error classification, convergence detection, retry decisions	Stage failure → `return 1`
orchestration	Stage sequencing, gate enforcement, self-healing loop entry	Pipeline failure → writes final state, exits

Data Flow

CLI args → parse_args → pipeline_start()
 → initialize_state() [state]
 → run_pipeline() [orchestration]
 → for each enabled stage:
 → record_stage_start(id) [state]
 → run_stage_with_retry(id) [executor]
 → stage_<id>() [stage modules]
 → classify_error(id) on failure [executor]
 → mark_stage_complete(id) [state → write_state → STATE_FILE]
 or mark_stage_failed(id) [state → emit_event → events.jsonl]
 [state → gh_update_progress → GitHub]

Alternatives Considered

Further decomposition (split large functions into sub-modules) — Pros: smaller units, easier to test in isolation. Cons: more files to source, higher bash overhead, more complex dependency graph. Rejected: current boundaries are clean and match single-responsibility. Adding tests for existing functions is lower-risk than restructuring.
Rewrite tests using bats-core — Pros: proper TAP output, better assertion library, setup/teardown hooks. Cons: new dependency, requires rewriting 9 existing test files, team learning curve. Rejected: existing test-helpers.sh framework works well and consistency across the test suite matters more.
Test only public API, skip internal functions — Pros: fewer tests to write, less maintenance. Cons: won't reach >80% coverage on complex functions like mark_stage_complete (94 lines with many side effects) and self_healing_build_test (287 lines). Rejected: these functions have complex branching that needs verification.
Convert state to JSON throughout — Pros: structured data, easier validation. Cons: jq round-trips on every status lookup (hot path), Bash 3.2 compatibility concerns, large diff touching all consumers. Rejected: newline-delimited format is fast and sufficient; persistence layer already handles serialization.

Implementation Plan

Files to create: None
Files to modify:
- scripts/sw-lib-pipeline-state-test.sh — add ~15 tests for write_state, resume_state, get_slowest_stage, build_stage_progress, mark_stage_complete, mark_stage_failed, update_status
- scripts/sw-lib-pipeline-stage-executor-test.sh — add ~10 tests for self_healing_build_test (happy path, convergence stuck, plateau detection), run_stage_with_retry edge cases (plan artifact skip, configuration error escalation)
Dependencies: None new
Risk areas:
- mark_stage_complete/failed tests require many stubs (emit_event, gh_*, checkpoint) — mitigated by reusing existing stub pattern already in the test file
- self_healing_build_test contains sleep calls — override sleep in tests, mock stage functions to fail/succeed deterministically
- write_state test must not corrupt real state — tests already use TEST_TEMP_DIR isolation
- resume_state needs a valid YAML state file — generate content in test using known-good format from write_state

Validation Criteria

sw-pipeline.sh remains < 1,500 lines (currently 708 — already met)
pipeline-state.sh exists as separate module with include guard (already met, 612 lines)
pipeline-stage-executor.sh exists as separate module with include guard (already met, 645 lines)
All existing tests pass (npm test green, no regressions)
20+ new unit tests added across both test files
Function coverage >80% for pipeline-state.sh (from ~60% → ~95%)
Line coverage >80% for pipeline-stage-executor.sh (from ~40% → ~80%)
Each module can be sourced independently without side effects (include guards + safe defaults verified)
No production code changes — test-only additions

Endpoint Specification / Rate Limiting / Versioning

Not applicable — these are shell modules with function-call interfaces, not HTTP APIs. The interface contracts above define the function signatures and error behavior.

Pipeline Design 172

Design: sw-pipeline.sh modular extraction — stage executor and state manager

Context

Decision

Module Boundaries (3 layers, strict dependency direction)

Include Guards

State Representation

Error Classification Strategy

Self-Healing Loops

Interface Contracts

Error Boundaries

Data Flow

Alternatives Considered

Implementation Plan

Validation Criteria

Endpoint Specification / Rate Limiting / Versioning

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally