Pipeline Design 189

Seth Ford edited this page Apr 5, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Summary of key decisions:

Composable middleware (Option B) over 14 per-stage wrappers — most stages share identical orchestration, only build/test are special
4 new functions in pipeline-stages.sh: check_human_directives(), select_stage_model(), broadcast_stage_discovery(), run_stage()
Self-healing build+test stays inline in run_pipeline() — counter coupling makes extraction risky for no testability gain
Error boundaries: cross-cutting concerns never cause stage failure (fail-open/fire-and-forget), only run_stage() propagates actual stage failures
~200 lines removed from run_pipeline(), ~180 added to pipeline-stages.sh as testable functions
1 new test file (sw-lib-pipeline-execution-test.sh) + ~20 new tests in existing sw-lib-pipeline-stages-test.sh fault)

Functions share state via global variables (ARTIFACTS_DIR, ISSUE_NUMBER, CLAUDE_MODEL, etc.)
Optional modules (audit_emit, gh_checks_stage_update, ucb1_select_model) may or may not be loaded — all calls must use type funcname >/dev/null 2>&1 guards
The self-healing build+test loop tightly couples two stages with counter management — it cannot be cleanly extracted without breaking the completed counter

Decision

Extract 4 composable middleware functions into scripts/lib/pipeline-stages.sh, then simplify run_pipeline() to call them. This is Option B from the plan — composable functions rather than 14 per-stage wrappers.

New Functions

1. check_human_directives(stage_id) — returns 0 (proceed) or 1 (skipped)

Extracts lines 542-569 from run_pipeline(). Handles two file-based intervention mechanisms:

skip-stage.txt: grep for stage ID, remove from file if found, emit stage.skipped event
human-message.txt: display message, emit pipeline.human_message event, delete file

Fail-safe: all file reads guarded with 2>/dev/null || true. If files are missing or malformed, stage proceeds normally.

2. select_stage_model(stage_id) — returns 0 always (best-effort)

Extracts lines 694-763 from run_pipeline(). Three-tier model selection:

UCB1 (when ucb1_select_model is available and has data): Direct model recommendation from multi-armed bandit
A/B testing (when intelligence_recommend_model is available): Randomized experiment/control split with configurable ratio from daemon-config.json
Graduated (when routing file shows >=50 samples): Bypass A/B, use recommended model directly

Side effect: exports CLAUDE_MODEL, emits intelligence.model_ucb1 or intelligence.model_ab event.

3. broadcast_stage_discovery(stage_id) — returns 0 always (fire-and-forget)

Extracts lines 809-821 from run_pipeline(). Maps stage ID to discovery category and file patterns:

plan -> *.md
design -> *.md,*.ts,*.tsx,*.js
build -> src/*,*.ts,*.tsx,*.js
test -> *.test.*,*_test.*
review -> *.md,*.ts,*.tsx
default -> *

Calls sw-discovery.sh broadcast as a subprocess. All errors suppressed.

4. run_stage(stage_id, enabled_count, completed_count) — returns 0 (success) or 1 (failure)

Wraps lines 766-852 from run_pipeline() into a composite function that orchestrates:

Progress display (Stage: id [n/total])
Status update to running
Start time recording + event emission
GitHub Check Run in_progress update
Audit trail stage.start emission
Delegate to run_stage_with_retry(stage_id)
On success: mark complete, capture patterns (intake), timing, events, audit, vitals, UCB1 outcome, discovery broadcast, model routing log
On failure: mark failed, error events, audit, vitals, UCB1 outcome, cancel remaining check runs

Sets LAST_STAGE_ERROR and LAST_STAGE_ERROR_CLASS on failure for caller consumption.

What stays inline in `run_pipeline()`

Self-healing build+test loop (lines 612-648): Tightly couples two stages with counter management (completed += 2). Extracting this would require passing mutable counter state through function boundaries, adding complexity for no testability gain.
Gate checks (lines 664-679): Interactive read prompt that controls pipeline pause/resume flow. Must stay in the loop to return 0 from run_pipeline().
Budget enforcement (lines 681-692): Similar flow control — needs to pause the pipeline, not just skip a stage.
Intelligence skip evaluation (lines 577-586): Already a clean single function call; wrapping it adds nothing.
CI resume logic (lines 596-609): Artifact verification that may fall through to stage execution.

Error Handling Strategy

Function	Error Boundary	Failure Mode
`check_human_directives`	Fail-open	File errors suppressed, stage proceeds
`select_stage_model`	Fail-open	All paths guarded, falls back to `_smart_model default sonnet`
`broadcast_stage_discovery`	Fire-and-forget	Subprocess errors suppressed with `2>/dev/null
`run_stage`	Fail-propagate	Returns 1 on stage failure, caller handles pipeline-level response

Variable Scope Contract

All new functions operate on globals already set by sw-pipeline.sh and pipeline-execution.sh. Each function uses ${VAR:-default} for every global reference, matching the existing convention in pipeline-stages.sh (lines 31-48). No new globals introduced.

Alternatives Considered

Per-Stage Wrappers (run_intake_stage(), etc.) — Pros: Matches issue acceptance criteria literally; each stage wrapper independently testable. Cons: 14 wrapper functions where 12 are identical boilerplate (only build/test have special logic); violates DRY; ~300 lines of duplicated orchestration. Rejected because the orchestration is stage-agnostic.
Do Nothing — Pros: Zero regression risk; stages are already in separate files. Cons: run_pipeline() remains 390 lines mixing concerns; cross-cutting logic untestable in isolation. Rejected because the opportunity to improve testability is worth the moderate risk.
Event-Driven Stage Lifecycle — Pros: Maximum decoupling via event emitters/listeners. Cons: Bash has no native event system; implementing one adds significant complexity for only 4 cross-cutting concerns. Rejected as overkill.

Implementation Plan

Files to create

scripts/sw-lib-pipeline-execution-test.sh — Unit tests for run_stage_with_retry, self_healing_build_test, and orchestration integration

Files to modify

scripts/lib/pipeline-stages.sh — Add 4 new functions (~180 lines)
scripts/lib/pipeline-execution.sh — Simplify run_pipeline() (~200 lines removed, ~40 added)
scripts/sw-lib-pipeline-stages-test.sh — Add ~20 unit tests (~250 lines)
package.json — Register sw-lib-pipeline-execution-test.sh

Dependencies

No new external dependencies
New functions depend on existing loaded modules: pipeline-state.sh, helpers.sh, compat.sh
All dependencies already sourced before pipeline-stages.sh in the load chain

Risk areas

1. Variable Scope Breakage (HIGH likelihood, MEDIUM impact) Extracted functions reference 15+ globals. If unset in test contexts, functions fail under set -u. Mitigation: Every global uses ${VAR:-default}. Test setup mirrors existing pipeline-stages-test.sh.

2. Self-Healing Counter Coupling (MEDIUM likelihood, HIGH impact) The build+test self-healing loop increments completed by 2. If run_stage() is accidentally called during self-healing, counts break. Mitigation: Self-healing block stays inline. The existing continue on line 648 prevents run_stage() from being reached.

3. Return vs Exit in Subshells (LOW likelihood, HIGH impact) If a new function is called in a pipeline (|) or $(), return exits the subshell not the caller. Mitigation: All new functions called directly (no pipes). Code review enforces this.

4. Module Load Order (LOW likelihood, MEDIUM impact) New functions may call audit_emit, gh_checks_stage_update which load conditionally. Mitigation: All optional calls use type funcname >/dev/null 2>&1 && guards.

Validation Criteria

check_human_directives(), select_stage_model(), broadcast_stage_discovery(), run_stage() exist in scripts/lib/pipeline-stages.sh
run_pipeline() reduced by ~200 lines (from ~390 to ~190 in the stage loop)
run_pipeline() calls all 4 new functions (verified by grep)
All functions use ${VAR:-default} for every global variable reference
All optional module calls use type funcname >/dev/null 2>&1 guards
Self-healing build+test loop remains inline (not extracted)
No new subshells introduced for extracted functions
Unit tests cover happy path, error path, and missing dependencies for each function
sw-lib-pipeline-stages-test.sh passes with ~20 new tests
sw-lib-pipeline-execution-test.sh registered in package.json
sw-pipeline-test.sh (58 tests) passes without modification
sw-e2e-smoke-test.sh (19 tests) passes without modification
No Bash 3.2 incompatibilities introduced
CLAUDE.md Shared Libraries table updated

Pipeline Design 189

Decision

New Functions

What stays inline in run_pipeline()

Error Handling Strategy

Variable Scope Contract

Alternatives Considered

Implementation Plan

Files to create

Files to modify

Dependencies

Risk areas

Validation Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

What stays inline in `run_pipeline()`