Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 189

Seth Ford edited this page Apr 5, 2026 · 2 revisions

Plan written to .claude/pipeline-artifacts/plan.md.

Summary of the plan:

The issue assumed sw-pipeline.sh was ~2839 lines, but prior work already decomposed it to 279 lines with stages in lib/pipeline-stages-*.sh. The remaining work is extracting cross-cutting concerns from run_pipeline() in pipeline-execution.sh (~390 lines of mixed orchestration):

4 new functions in pipeline-stages.sh:

  1. check_human_directives() — skip-stage/human-message file handling
  2. select_stage_model() — UCB1/A/B intelligence model routing
  3. broadcast_stage_discovery() — post-success cross-pipeline learning
  4. run_stage() — unified stage runner (timing, audit, checks, retry, events)

Key design decision: Extract composable middleware functions (Option B) rather than 14 per-stage wrappers (Option A), since most stages share identical orchestration — only build/test have special self-healing logic.

10 implementation tasks, 2 new/modified test files, ~20 new unit tests, ~200 lines removed from run_pipeline(). mpound_quality(), stage_audit()| |lib/pipeline-stages-delivery.sh| 965 |stage_pr(), stage_merge(), stage_deploy()| |lib/pipeline-stages-monitor.sh| 407 |stage_validate(), stage_monitor()` |

Stage implementations are already in domain-specific lib files. The acceptance criteria "New lib/pipeline-stages.sh with functions: run_intake_stage, run_plan_stage, etc." and "sw-pipeline.sh reduced to <1500 lines" are already satisfied in spirit.

Remaining Work

What's NOT yet decomposed is in pipeline-execution.sh — the run_pipeline() function (lines 507-897, ~390 lines) mixes generic orchestration with stage-specific logic:

  1. Build-specific self-healing (lines 612-648): Self-healing build+test loop with TDD integration
  2. Stage-specific test skip (lines 656-662): Skip test stage when handled by self-healing
  3. Model routing (lines 694-763): ~70 lines of intelligence model routing per stage
  4. Post-success callbacks (lines 789-821): Stage-specific discovery broadcasting, memory capture
  5. Human intervention (lines 542-569): Skip-stage directives, human messages

These can be extracted into composable functions in lib/pipeline-stages.sh, creating the run_<stage>_stage() wrappers the issue requests.

Design Alternatives Considered

Option A: Create run_<stage>_stage() wrappers per stage (14 functions)

  • Pros: Matches issue acceptance criteria exactly; each stage independently testable
  • Cons: Lots of boilerplate — most wrappers would just call stage_<id>() since only build/test have special logic
  • Rejected: Too much duplication for minimal benefit

Option B: Extract cross-cutting concerns into composable middleware functions

  • Pros: DRY; each concern (model routing, human intervention, GitHub checks, discovery) becomes a testable function; run_pipeline() becomes a clean loop
  • Cons: Slightly more complex function signatures
  • Chosen: This approach

Option C: Do nothing — decomposition is already done

  • Pros: No risk of regression
  • Cons: Doesn't satisfy issue acceptance criteria; misses opportunity to simplify run_pipeline()
  • Rejected: Issue requests specific deliverables

Component Diagram

sw-pipeline.sh (CLI router, 279 lines)
 |
 +-- lib/pipeline-cli.sh (argument parsing)
 +-- lib/pipeline-commands.sh (start/resume/status/abort)
 | |
 | +-- lib/pipeline-execution.sh (orchestration loop)
 | |
 | +-- lib/pipeline-stages.sh (stage runner + helpers) <-- MODIFY
 | | |
 | | +-- run_stage() <-- NEW: unified stage runner
 | | +-- select_stage_model() <-- NEW: model routing extract
 | | +-- check_human_directives() <-- NEW: human intervention
 | | +-- broadcast_stage_discovery() <-- NEW: post-success hook
 | | |
 | | +-- lib/pipeline-stages-intake.sh (stage_intake, stage_plan, ...)
 | | +-- lib/pipeline-stages-build.sh (stage_build, stage_test, ...)
 | | +-- lib/pipeline-stages-review.sh (stage_review, ...)
 | | +-- lib/pipeline-stages-delivery.sh (stage_pr, stage_merge, ...)
 | | +-- lib/pipeline-stages-monitor.sh (stage_validate, stage_monitor)
 | |
 | +-- lib/pipeline-state.sh (state persistence)
 |
 +-- (intelligence, GitHub, memory modules)

Interface Contracts

# NEW: Unified stage runner — encapsulates timing, checks, audit, discovery
# Called by run_pipeline() for each stage. Returns 0 on success, 1 on failure.
# Side effects: updates stage status, emits events, writes model-routing.log
run_stage() {
 local stage_id="1ドル" # e.g. "intake", "build", "plan"
 local enabled_count="2ドル" # total enabled stages (for progress display)
 local completed_count="3ドル" # stages completed so far
 # Returns: 0=success, 1=failure
 # Sets: LAST_STAGE_ERROR, LAST_STAGE_ERROR_CLASS (on failure)
}
# NEW: Intelligence model routing for a stage
# Extracts the ~70 lines of UCB1/A/B model routing from run_pipeline()
select_stage_model() {
 local stage_id="1ドル"
 # Side effect: exports CLAUDE_MODEL, emits intelligence.model_* events
 # Returns: 0 always (model selection is best-effort)
}
# NEW: Check for human intervention directives (skip-stage, human-message)
# Returns 0 if stage should proceed, 1 if skipped
check_human_directives() {
 local stage_id="1ドル"
 # Returns: 0=proceed, 1=skipped (emits stage.skipped event)
}
# NEW: Post-success discovery broadcast
broadcast_stage_discovery() {
 local stage_id="1ドル"
 # Side effect: calls sw-discovery.sh broadcast with stage-appropriate patterns
}

Data Flow

run_pipeline() iterates stages from PIPELINE_CONFIG
 |
 +-- For each stage:
 | +-- check_human_directives(stage_id) -> skip or proceed
 | +-- check intelligence skip (existing pipeline_should_skip_stage)
 | +-- check already complete (get_stage_status)
 | +-- Handle self-healing build+test special case
 | +-- Gate check (approve/auto)
 | +-- Budget check
 | +-- select_stage_model(stage_id) -> sets CLAUDE_MODEL
 | +-- run_stage(stage_id, counts) -> timing, retry, events, audit, discovery
 |
 +-- Pipeline summary on completion

Error Boundaries

  • run_stage() catches stage failures via run_stage_with_retry(), records error class/message, emits events. Returns 1 on failure — run_pipeline() handles pipeline-level failure response.
  • select_stage_model() is best-effort — all failures suppressed with || true. Defaults to _smart_model default sonnet.
  • check_human_directives() is fail-safe — file read errors suppressed. Worst case: stage proceeds normally.
  • broadcast_stage_discovery() is fire-and-forget — all calls guarded with 2>/dev/null || true.

Files to Modify

File Action Lines Changed (est.)
scripts/lib/pipeline-stages.sh Add run_stage(), select_stage_model(), check_human_directives(), broadcast_stage_discovery() +180
scripts/lib/pipeline-execution.sh Simplify run_pipeline() to use new functions -200, +40
scripts/sw-lib-pipeline-stages-test.sh Add unit tests for all new functions +250
scripts/sw-lib-pipeline-execution-test.sh New test file for execution orchestration +200 (new file)

Implementation Steps

Step 1: Extract check_human_directives() into pipeline-stages.sh

Move lines 542-569 from run_pipeline() into a standalone function. Lowest-risk extraction — pure file reads with no complex state.

Step 2: Extract select_stage_model() into pipeline-stages.sh

Move lines 694-763 from run_pipeline(). Most complex extraction — UCB1 + A/B testing logic.

Step 3: Extract broadcast_stage_discovery() into pipeline-stages.sh

Move lines 809-821 from run_pipeline(). Simple stage-to-pattern mapping + subprocess call.

Step 4: Create run_stage() in pipeline-stages.sh

Composite function that wraps: timing, status updates, GitHub check runs, audit, retry, model outcome recording, vitals, and discovery broadcast. This is lines 766-852 of run_pipeline() refactored into a self-contained function.

Step 5: Simplify run_pipeline() in pipeline-execution.sh

Replace inline logic with calls to new functions. Main loop calls check_human_directives, select_stage_model, and run_stage instead of having all that logic inline.

Step 6: Write unit tests for new functions in sw-lib-pipeline-stages-test.sh

  • check_human_directives: skip-stage file, human-message file, no files, malformed files
  • select_stage_model: UCB1 path, A/B test path, no intelligence functions, graduated path
  • broadcast_stage_discovery: each stage type maps to correct patterns
  • run_stage: success path, failure path, timing recorded, events emitted

Step 7: Create sw-lib-pipeline-execution-test.sh for orchestration tests

New test file covering:

  • run_stage_with_retry: plan artifact skip, error classification, retry backoff
  • self_healing_build_test: convergence detection, cycle limits

Step 8: Run existing tests to verify no regression

./scripts/sw-lib-pipeline-stages-test.sh
./scripts/sw-pipeline-test.sh
./scripts/sw-e2e-smoke-test.sh

Step 9: Register new test in package.json

Add sw-lib-pipeline-execution-test.sh to the test scripts in package.json.

Step 10: Update CLAUDE.md documentation

Update the Architecture section to reflect the new function boundaries and the Shared Libraries table.

Task Checklist

  • Task 1: Extract check_human_directives() from run_pipeline() into pipeline-stages.sh
  • Task 2: Extract select_stage_model() from run_pipeline() into pipeline-stages.sh
  • Task 3: Extract broadcast_stage_discovery() from run_pipeline() into pipeline-stages.sh
  • Task 4: Create run_stage() composite function in pipeline-stages.sh
  • Task 5: Refactor run_pipeline() in pipeline-execution.sh to use new functions
  • Task 6: Add unit tests for new functions to sw-lib-pipeline-stages-test.sh
  • Task 7: Create sw-lib-pipeline-execution-test.sh with orchestration tests
  • Task 8: Register new test file in package.json
  • Task 9: Run existing test suites to verify no regression
  • Task 10: Update CLAUDE.md documentation with new architecture

Testing Approach

Test Pyramid Breakdown

  • Unit tests (~20 tests): Each new function tested in isolation with mocked dependencies
    • check_human_directives: 4 tests (skip file, human message, no files, cleanup)
    • select_stage_model: 5 tests (UCB1, A/B experiment, A/B control, graduated, no intelligence)
    • broadcast_stage_discovery: 4 tests (plan, build, test, unknown stage)
    • run_stage: 5 tests (success, failure, timing, audit, GitHub checks)
  • Integration tests (~5 tests): run_pipeline() with mocked stage functions to verify sequencing
  • E2E tests (existing): sw-pipeline-test.sh and sw-e2e-smoke-test.sh verify full pipeline flow

Coverage Targets

  • 100% of new function code paths
  • Existing test suites pass without modification (critical requirement)

Critical Paths to Test

  • Happy path: Stage runs, succeeds, timing/events recorded, discovery broadcast
  • Error case 1: Stage fails, error classified, pipeline halts with correct status
  • Error case 2: Human skip directive removes stage from skip file
  • Edge case 1: select_stage_model() with no intelligence functions available
  • Edge case 2: run_stage() when audit/vitals/checks functions don't exist (optional modules)

Definition of Done

  • pipeline-stages.sh exports: check_human_directives, select_stage_model, broadcast_stage_discovery, run_stage
  • run_pipeline() in pipeline-execution.sh uses new functions (reduced by ~200 lines)
  • All 102+ existing test suites pass without modification
  • New unit tests pass for each extracted function
  • New sw-lib-pipeline-execution-test.sh registered in package.json
  • No performance regression: function call overhead is negligible for shell (no new subshells)
  • CLAUDE.md updated with new architecture

Failure Mode Analysis

1. Variable Scope Breakage (CRITICAL)

What could break: Extracted functions reference global variables (ARTIFACTS_DIR, ISSUE_NUMBER, CLAUDE_MODEL, etc.) that are set in sw-pipeline.sh. If a function is sourced in a different context (e.g., test env), variables may be unset. Mitigation: Each new function will use ${VAR:-default} patterns for all global variable references, matching existing lib file conventions. Test environment sets all required globals (already shown in existing test setup).

2. Self-Healing Build+Test Coupling

What could break: The build+test self-healing loop (lines 612-648) tightly couples build and test stages with counter management. Extracting this incorrectly could break the completed counter. Mitigation: Leave the self-healing block inline in run_pipeline() — it's already a special case that interleaves two stages. Only extract the per-stage concerns (model routing, timing, discovery).

3. return vs exit in Sourced Functions

What could break: Using return in a function that's called from a pipeline (|) or subshell would exit the subshell, not the calling function. Mitigation: Maintain direct function calls (no pipes). All new functions use return 0/return 1 consistently. Test both success and failure paths.

4. Module Load Order

What could break: New functions in pipeline-stages.sh reference functions from pipeline-state.sh, pipeline-intelligence-skip.sh, and helper modules. If sourced before those modules, functions will be undefined. Mitigation: pipeline-stages.sh is already sourced AFTER all dependency modules in sw-pipeline.sh (line 52, after lines 37-50). New functions use type funcname >/dev/null 2>&1 guards before calling optional functions, matching existing patterns.

Baseline Metrics

Performance profiling is not applicable for this refactoring because:

  • Shell function calls have negligible overhead (~0.1ms per call)
  • No new subshells, subprocesses, or file I/O are introduced
  • The refactoring is a code reorganization, not an algorithm change
  • Pipeline stage latency is dominated by Claude API calls (30s-300s per stage), not shell orchestration

The existing STAGE_TIMINGS mechanism already captures per-stage timing. We'll verify no regression by comparing before/after timings in the existing test suite.

Profiling Strategy

Not applicable — see Baseline Metrics above. The refactoring moves code between files without changing execution paths.

Optimization Targets

Not applicable — this is a refactoring for testability and maintainability, not performance.

Benchmark Plan

Run sw-pipeline-test.sh (58 tests) and sw-e2e-smoke-test.sh (19 tests) before and after. Both suites complete in <60s and exercise the full pipeline orchestration path. Any timing regression >5% would indicate an issue.

Root Cause Hypothesis

  1. Stale issue metadata (Likelihood: HIGH) — The issue was created when sw-pipeline.sh was monolithic. Prior work already extracted stages. Evidence: sw-pipeline.sh is 279 lines, stage files exist.
  2. Partial extraction (Likelihood: HIGH) — Stages were extracted but orchestration concerns weren't. Evidence: run_pipeline() still has ~390 lines of mixed concerns.
  3. Scope creep in orchestrator (Likelihood: MEDIUM) — New features (model routing, discovery, human intervention) were added to run_pipeline() after the initial extraction, re-bloating it.

Evidence Gathered

  • sw-pipeline.sh: 279 lines (confirmed via wc -l)
  • lib/pipeline-execution.sh: 897 lines, with run_pipeline() spanning lines 507-897
  • Stage functions (stage_intake, stage_plan, etc.) already exist in lib/pipeline-stages-*.sh
  • Existing sw-lib-pipeline-stages-test.sh tests helper functions but not orchestration
  • No existing sw-lib-pipeline-execution-test.sh

Fix Strategy

Complete the decomposition by extracting cross-cutting concerns from run_pipeline() into testable functions in pipeline-stages.sh. This differs from the original issue's assumption (extract stages from sw-pipeline.sh) because stages are already extracted — the remaining work is orchestration cleanup.

Verification Plan

  1. Run existing tests BEFORE changes to establish baseline
  2. Extract one function at a time, running tests after each
  3. Run full test suite after all extractions
  4. Verify run_pipeline() line count decreased by ~200 lines
  5. Verify new functions are independently testable

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /