Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 189

Seth Ford edited this page Apr 5, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Summary of key decisions:

  1. Composable middleware (Option B) over 14 per-stage wrappers — most stages share identical orchestration, only build/test are special
  2. 4 new functions in pipeline-stages.sh: check_human_directives(), select_stage_model(), broadcast_stage_discovery(), run_stage()
  3. Self-healing build+test stays inline in run_pipeline() — counter coupling makes extraction risky for no testability gain
  4. Error boundaries: cross-cutting concerns never cause stage failure (fail-open/fire-and-forget), only run_stage() propagates actual stage failures
  5. ~200 lines removed from run_pipeline(), ~180 added to pipeline-stages.sh as testable functions
  6. 1 new test file (sw-lib-pipeline-execution-test.sh) + ~20 new tests in existing sw-lib-pipeline-stages-test.sh fault)
  • Functions share state via global variables (ARTIFACTS_DIR, ISSUE_NUMBER, CLAUDE_MODEL, etc.)
  • Optional modules (audit_emit, gh_checks_stage_update, ucb1_select_model) may or may not be loaded — all calls must use type funcname >/dev/null 2>&1 guards
  • The self-healing build+test loop tightly couples two stages with counter management — it cannot be cleanly extracted without breaking the completed counter

Decision

Extract 4 composable middleware functions into scripts/lib/pipeline-stages.sh, then simplify run_pipeline() to call them. This is Option B from the plan — composable functions rather than 14 per-stage wrappers.

New Functions

1. check_human_directives(stage_id) — returns 0 (proceed) or 1 (skipped)

Extracts lines 542-569 from run_pipeline(). Handles two file-based intervention mechanisms:

  • skip-stage.txt: grep for stage ID, remove from file if found, emit stage.skipped event
  • human-message.txt: display message, emit pipeline.human_message event, delete file

Fail-safe: all file reads guarded with 2>/dev/null || true. If files are missing or malformed, stage proceeds normally.

2. select_stage_model(stage_id) — returns 0 always (best-effort)

Extracts lines 694-763 from run_pipeline(). Three-tier model selection:

  1. UCB1 (when ucb1_select_model is available and has data): Direct model recommendation from multi-armed bandit
  2. A/B testing (when intelligence_recommend_model is available): Randomized experiment/control split with configurable ratio from daemon-config.json
  3. Graduated (when routing file shows >=50 samples): Bypass A/B, use recommended model directly

Side effect: exports CLAUDE_MODEL, emits intelligence.model_ucb1 or intelligence.model_ab event.

3. broadcast_stage_discovery(stage_id) — returns 0 always (fire-and-forget)

Extracts lines 809-821 from run_pipeline(). Maps stage ID to discovery category and file patterns:

  • plan -> *.md
  • design -> *.md,*.ts,*.tsx,*.js
  • build -> src/*,*.ts,*.tsx,*.js
  • test -> *.test.*,*_test.*
  • review -> *.md,*.ts,*.tsx
  • default -> *

Calls sw-discovery.sh broadcast as a subprocess. All errors suppressed.

4. run_stage(stage_id, enabled_count, completed_count) — returns 0 (success) or 1 (failure)

Wraps lines 766-852 from run_pipeline() into a composite function that orchestrates:

  1. Progress display (Stage: id [n/total])
  2. Status update to running
  3. Start time recording + event emission
  4. GitHub Check Run in_progress update
  5. Audit trail stage.start emission
  6. Delegate to run_stage_with_retry(stage_id)
  7. On success: mark complete, capture patterns (intake), timing, events, audit, vitals, UCB1 outcome, discovery broadcast, model routing log
  8. On failure: mark failed, error events, audit, vitals, UCB1 outcome, cancel remaining check runs

Sets LAST_STAGE_ERROR and LAST_STAGE_ERROR_CLASS on failure for caller consumption.

What stays inline in run_pipeline()

  • Self-healing build+test loop (lines 612-648): Tightly couples two stages with counter management (completed += 2). Extracting this would require passing mutable counter state through function boundaries, adding complexity for no testability gain.
  • Gate checks (lines 664-679): Interactive read prompt that controls pipeline pause/resume flow. Must stay in the loop to return 0 from run_pipeline().
  • Budget enforcement (lines 681-692): Similar flow control — needs to pause the pipeline, not just skip a stage.
  • Intelligence skip evaluation (lines 577-586): Already a clean single function call; wrapping it adds nothing.
  • CI resume logic (lines 596-609): Artifact verification that may fall through to stage execution.

Error Handling Strategy

Function Error Boundary Failure Mode
check_human_directives Fail-open File errors suppressed, stage proceeds
select_stage_model Fail-open All paths guarded, falls back to _smart_model default sonnet
broadcast_stage_discovery Fire-and-forget Subprocess errors suppressed with `2>/dev/null
run_stage Fail-propagate Returns 1 on stage failure, caller handles pipeline-level response

Variable Scope Contract

All new functions operate on globals already set by sw-pipeline.sh and pipeline-execution.sh. Each function uses ${VAR:-default} for every global reference, matching the existing convention in pipeline-stages.sh (lines 31-48). No new globals introduced.

Alternatives Considered

  1. Per-Stage Wrappers (run_intake_stage(), etc.) — Pros: Matches issue acceptance criteria literally; each stage wrapper independently testable. Cons: 14 wrapper functions where 12 are identical boilerplate (only build/test have special logic); violates DRY; ~300 lines of duplicated orchestration. Rejected because the orchestration is stage-agnostic.

  2. Do Nothing — Pros: Zero regression risk; stages are already in separate files. Cons: run_pipeline() remains 390 lines mixing concerns; cross-cutting logic untestable in isolation. Rejected because the opportunity to improve testability is worth the moderate risk.

  3. Event-Driven Stage Lifecycle — Pros: Maximum decoupling via event emitters/listeners. Cons: Bash has no native event system; implementing one adds significant complexity for only 4 cross-cutting concerns. Rejected as overkill.

Implementation Plan

Files to create

  • scripts/sw-lib-pipeline-execution-test.sh — Unit tests for run_stage_with_retry, self_healing_build_test, and orchestration integration

Files to modify

  • scripts/lib/pipeline-stages.sh — Add 4 new functions (~180 lines)
  • scripts/lib/pipeline-execution.sh — Simplify run_pipeline() (~200 lines removed, ~40 added)
  • scripts/sw-lib-pipeline-stages-test.sh — Add ~20 unit tests (~250 lines)
  • package.json — Register sw-lib-pipeline-execution-test.sh

Dependencies

  • No new external dependencies
  • New functions depend on existing loaded modules: pipeline-state.sh, helpers.sh, compat.sh
  • All dependencies already sourced before pipeline-stages.sh in the load chain

Risk areas

1. Variable Scope Breakage (HIGH likelihood, MEDIUM impact) Extracted functions reference 15+ globals. If unset in test contexts, functions fail under set -u. Mitigation: Every global uses ${VAR:-default}. Test setup mirrors existing pipeline-stages-test.sh.

2. Self-Healing Counter Coupling (MEDIUM likelihood, HIGH impact) The build+test self-healing loop increments completed by 2. If run_stage() is accidentally called during self-healing, counts break. Mitigation: Self-healing block stays inline. The existing continue on line 648 prevents run_stage() from being reached.

3. Return vs Exit in Subshells (LOW likelihood, HIGH impact) If a new function is called in a pipeline (|) or $(), return exits the subshell not the caller. Mitigation: All new functions called directly (no pipes). Code review enforces this.

4. Module Load Order (LOW likelihood, MEDIUM impact) New functions may call audit_emit, gh_checks_stage_update which load conditionally. Mitigation: All optional calls use type funcname >/dev/null 2>&1 && guards.

Validation Criteria

  • check_human_directives(), select_stage_model(), broadcast_stage_discovery(), run_stage() exist in scripts/lib/pipeline-stages.sh
  • run_pipeline() reduced by ~200 lines (from ~390 to ~190 in the stage loop)
  • run_pipeline() calls all 4 new functions (verified by grep)
  • All functions use ${VAR:-default} for every global variable reference
  • All optional module calls use type funcname >/dev/null 2>&1 guards
  • Self-healing build+test loop remains inline (not extracted)
  • No new subshells introduced for extracted functions
  • Unit tests cover happy path, error path, and missing dependencies for each function
  • sw-lib-pipeline-stages-test.sh passes with ~20 new tests
  • sw-lib-pipeline-execution-test.sh registered in package.json
  • sw-pipeline-test.sh (58 tests) passes without modification
  • sw-e2e-smoke-test.sh (19 tests) passes without modification
  • No Bash 3.2 incompatibilities introduced
  • CLAUDE.md Shared Libraries table updated

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /