Pipeline Plan 172

Jump to bottom

ezigus edited this page Mar 17, 2026 · 2 revisions

Implementation Plan: sw-pipeline.sh Modular Extraction

Alternatives Considered

Approach A: Extract into 5 focused modules by responsibility (CHOSEN)

Move run_stage_with_retry(), classify_error(), self_healing_build_test(), self_healing_review_build_test() into scripts/lib/pipeline-stage-executor.sh
Move utility functions (format_duration, parse_coverage_from_output, rotate_event_log_if_needed, estimate_pipeline_cost, parse_claude_tokens, notify, _pipeline_compact_goal, load_composed_pipeline) into scripts/lib/pipeline-utils.sh
Move orchestration (run_pipeline, run_dry_run, auto_rebase, pipeline_post_completion_cleanup, pipeline_cancel_check_runs, generate_reasoning_trace) into scripts/lib/pipeline-orchestration.sh
Move worktree functions into scripts/lib/pipeline-worktree.sh
Move CLI commands (pipeline_start, pipeline_resume, pipeline_status, pipeline_abort, pipeline_list, pipeline_show) into scripts/lib/pipeline-commands.sh
Trade-offs: Clean separation by responsibility, sw-pipeline.sh becomes thin shell (~400 lines), maximizes testability. Moderate blast radius but incremental extraction is safe since modules are sourced (not subprocesses).

Approach B: Extract only stage executor, leave everything else

Minimal change: just create sw-pipeline-stage-executor.sh with retry/error logic
Trade-offs: Doesn't meet the <1500 line target. Less disruption but doesn't solve the core problem.

Approach C: Full rewrite with new architecture

Rewrite pipeline as a proper state machine with JSON-based transitions
Trade-offs: Maximum disruption, high risk of regression, not justified given existing working code.

Decision: Approach A. The existing pipeline-state.sh (612 lines, 21 functions) already proves this extraction pattern works. We extend it to the remaining concerns.

Component Diagram

┌──────────────────────────────────────────────────────────────┐
│ sw-pipeline.sh │
│ (~400 lines: shebang, source libs, parse_args, show_help, │
│ setup_dirs, find/load_pipeline_config, preflight_checks, │
│ heartbeat, ci helpers, cleanup_on_exit, main dispatch) │
└──────────────────────────┬───────────────────────────────────┘
 │ sources
 ┌──────────────────┼──────────────────────────┐
 ▼ ▼ ▼
┌───────────────┐ ┌────────────────────┐ ┌──────────────────────┐
│ pipeline- │ │ pipeline- │ │ pipeline- │
│ commands.sh │ │ orchestration.sh │ │ stage-executor.sh │
│ (~650 lines) │ │ (~550 lines) │ │ (~450 lines) │
│ start/resume/ │ │ run_pipeline │ │ run_stage_with_retry │
│ status/abort/ │ │ run_dry_run │ │ classify_error │
│ list/show │ │ auto_rebase │ │ self_healing_* │
└───────┬───────┘ │ post_cleanup │ └──────────┬───────────┘
 │ │ cancel_check_runs │ │
 │ │ reasoning_trace │ │
 │ └────────┬───────────┘ │
 │ │ │
 ▼ ▼ ▼
┌───────────────┐ ┌────────────────────┐ ┌──────────────────────┐
│ pipeline- │ │ pipeline-state.sh │ │ pipeline-utils.sh │
│ worktree.sh │ │ (EXISTING 612 ln) │ │ (~250 lines) │
│ (~80 lines) │ │ 21 functions │ │ format_duration │
│ setup/cleanup │ │ read/write/validate│ │ parse_coverage │
└───────────────┘ └────────────────────┘ │ estimate_cost │
 │ notify, etc. │
 └──────────────────────┘
Dependency direction: top → bottom only (no cycles)

Interface Contracts

pipeline-stage-executor.sh

# Execute a single stage with retry logic and error classification
# Input: stage_id (string), uses globals PIPELINE_CONFIG, ARTIFACTS_DIR
# Output: return 0 on success, 1 on failure
# Side effects: sets LAST_STAGE_ERROR_CLASS, LAST_STAGE_ERROR, emits events
# Error contract: returns 1 on config errors (no retry), retries infra errors
run_stage_with_retry(stage_id: string) -> exit_code
# Classify error type from stage logs
# Input: stage_id (string)
# Output: echoes "infrastructure" | "configuration" | "logic" | "unknown"
classify_error(stage_id: string) -> string
# Self-healing build→test loop with convergence detection
# Input: none (uses globals BUILD_TEST_RETRIES, ARTIFACTS_DIR, STATE_FILE)
# Output: return 0 if tests pass, 1 if exhausted
self_healing_build_test() -> exit_code
# Self-healing review→build→test loop
# Input: none (uses globals)
# Output: return 0 if review fixes succeed, 1 if exhausted
self_healing_review_build_test() -> exit_code

pipeline-orchestration.sh

# Main pipeline execution loop — iterates enabled stages
# Input: none (uses globals PIPELINE_CONFIG, STATE_FILE, etc.)
# Output: return 0 on complete, 1 on failure
run_pipeline() -> exit_code
# Dry-run validation mode — prints stage table without executing
run_dry_run() -> exit_code
# Auto-rebase current branch onto base branch
auto_rebase() -> exit_code
# Post-completion artifact cleanup
pipeline_post_completion_cleanup() -> void
# Cancel lingering GitHub check runs on abort
pipeline_cancel_check_runs() -> void
# Multi-step reasoning trace generation
generate_reasoning_trace() -> void

pipeline-commands.sh

pipeline_start() -> exit_code # Start new pipeline (main CLI entry)
pipeline_resume() -> exit_code # Resume from last completed stage
pipeline_status() -> void # Show current pipeline status
pipeline_abort() -> void # Abort running pipeline
pipeline_list() -> void # List saved pipelines
pipeline_show() -> void # Show detailed pipeline info

pipeline-utils.sh

format_duration(seconds: int) -> string
parse_coverage_from_output(log_file: string) -> string
rotate_event_log_if_needed() -> void
estimate_pipeline_cost(stages_json: string) -> json_string
parse_claude_tokens(output: string) -> json_string
notify(channel: string, message: string) -> void
_pipeline_compact_goal(goal: string) -> string
load_composed_pipeline() -> void

pipeline-worktree.sh

pipeline_setup_worktree() -> void
pipeline_cleanup_worktree() -> void

Data Flow

CLI args → parse_args() → dispatch (start|resume|status|abort|list|show)
 → pipeline_start()
 → preflight_checks() → load_pipeline_config()
 → run_pipeline()
 → for each stage in PIPELINE_CONFIG:
 → check skip/gate/budget/model-routing
 → run_stage_with_retry(stage_id)
 → stage_${id}() (from pipeline-stages-*.sh)
 → on failure: classify_error() → retry or fail
 → mark_stage_complete() / mark_stage_failed() (pipeline-state.sh)
 → write_state() (pipeline-state.sh)
 → pipeline_post_completion_cleanup()

Error Boundaries

Component	Handles	Propagates
pipeline-stage-executor.sh	Infrastructure retries, config abort, logic analysis	Returns 1 to orchestration on exhaustion
pipeline-orchestration.sh	Stage sequencing, self-healing coordination	Returns 1 to pipeline_start on failure
pipeline-commands.sh	CLI validation, lock acquisition, state conflicts	exit 1 for user errors
pipeline-state.sh	File I/O, atomic writes	Returns silently on non-critical failures
pipeline-utils.sh	Pure functions, no error propagation	Returns defaults on parse failures

Files to Create

File	Lines (est.)	Functions
`scripts/lib/pipeline-utils.sh`	~250	`format_duration`, `parse_coverage_from_output`, `rotate_event_log_if_needed`, `_pipeline_compact_goal`, `load_composed_pipeline`, `parse_claude_tokens`, `estimate_pipeline_cost`, `notify`
`scripts/lib/pipeline-worktree.sh`	~80	`pipeline_setup_worktree`, `pipeline_cleanup_worktree`
`scripts/lib/pipeline-stage-executor.sh`	~450	`run_stage_with_retry`, `classify_error`, `self_healing_build_test`, `self_healing_review_build_test`
`scripts/lib/pipeline-orchestration.sh`	~550	`run_pipeline`, `run_dry_run`, `auto_rebase`, `pipeline_post_completion_cleanup`, `pipeline_cancel_check_runs`, `generate_reasoning_trace`
`scripts/lib/pipeline-commands.sh`	~650	`pipeline_start`, `pipeline_resume`, `pipeline_status`, `pipeline_abort`, `pipeline_list`, `pipeline_show`

Files to Modify

File	Change
`scripts/sw-pipeline.sh`	Remove extracted functions, add source statements for 5 new modules. Keep: shebang/header, source libs, show_help, parse_args, setup_dirs, find/load_pipeline_config, preflight_checks, start/stop_heartbeat, ci_push_partial_work, ci_post_stage_event, cleanup_on_exit, main dispatch. Target: ~400 lines

Test Files to Create

File	Coverage
`scripts/sw-lib-pipeline-stage-executor-test.sh`	classify_error patterns, run_stage_with_retry scenarios, self-healing convergence
`scripts/sw-lib-pipeline-utils-test.sh`	format_duration edge cases, parse_coverage patterns, estimate_pipeline_cost
`scripts/sw-lib-pipeline-orchestration-test.sh`	run_pipeline with mock stages, dry-run validation

Implementation Steps

Step 1: Create pipeline-utils.sh (standalone, no internal deps)

Extract from sw-pipeline.sh:

parse_coverage_from_output() (L139-158)
format_duration() (L160-170)
rotate_event_log_if_needed() (L172-185)
_pipeline_compact_goal() (L187-212)
load_composed_pipeline() (L214-234)
parse_claude_tokens() (L236-246)
estimate_pipeline_cost() (L248-338)
notify() (L817-853)

Add include guard: [[ -n "${_PIPELINE_UTILS_LOADED:-}" ]] && return 0

Step 2: Create pipeline-worktree.sh (standalone)

Extract from sw-pipeline.sh:

pipeline_setup_worktree() (L2005-2043)
pipeline_cleanup_worktree() (L2046-2079)

Add include guard.

Step 3: Create pipeline-stage-executor.sh (depends on state, utils)

Extract from sw-pipeline.sh:

classify_error() (L855-950)
run_stage_with_retry() (L954-1111)
self_healing_build_test() (L1117-1409)
self_healing_review_build_test() (L1411-1482)

Add include guard. These reference globals (PIPELINE_CONFIG, ARTIFACTS_DIR, STATE_FILE, BUILD_TEST_RETRIES, ISSUE_NUMBER, LAST_STAGE_ERROR_CLASS, LAST_STAGE_ERROR) which remain in shared shell scope via sourcing.

Step 4: Create pipeline-orchestration.sh (depends on executor, state)

Extract from sw-pipeline.sh:

auto_rebase() (L1484-1517)
run_pipeline() (L1519-1921)
pipeline_post_completion_cleanup() (L1926-1980)
pipeline_cancel_check_runs() (L1983-2000)
run_dry_run() (L2084-2251)
generate_reasoning_trace() (L2256-2345)

Add include guard.

Step 5: Create pipeline-commands.sh (depends on orchestration, state, worktree)

Extract from sw-pipeline.sh:

pipeline_start() (L2349-2947)
pipeline_resume() (L2948-2954)
pipeline_status() (L2956-3048)
pipeline_abort() (L3050-3079)
pipeline_list() (L3081-3115)
pipeline_show() (L3117-end)

Add include guard.

Step 6: Reduce sw-pipeline.sh to orchestration shell

Remove all extracted function bodies. Add source lines for new modules after existing sources (maintaining correct dependency order):

# New modular extractions
[[ -f "$SCRIPT_DIR/lib/pipeline-utils.sh" ]] && source "$SCRIPT_DIR/lib/pipeline-utils.sh"
[[ -f "$SCRIPT_DIR/lib/pipeline-worktree.sh" ]] && source "$SCRIPT_DIR/lib/pipeline-worktree.sh"
[[ -f "$SCRIPT_DIR/lib/pipeline-stage-executor.sh" ]] && source "$SCRIPT_DIR/lib/pipeline-stage-executor.sh"
[[ -f "$SCRIPT_DIR/lib/pipeline-orchestration.sh" ]] && source "$SCRIPT_DIR/lib/pipeline-orchestration.sh"
[[ -f "$SCRIPT_DIR/lib/pipeline-commands.sh" ]] && source "$SCRIPT_DIR/lib/pipeline-commands.sh"

Verify final line count is <1500 (target ~400).

Step 7: Write tests for new modules

Create test files following existing patterns in sw-lib-pipeline-state-test.sh.

Step 8: Verify all existing tests pass

Run npm test and ./scripts/sw-pipeline-test.sh.

Task Checklist

Task 1: Create scripts/lib/pipeline-utils.sh — extract utility functions
Task 2: Create scripts/lib/pipeline-worktree.sh — extract worktree setup/cleanup
Task 3: Create scripts/lib/pipeline-stage-executor.sh — extract error classification, retry logic, self-healing
Task 4: Create scripts/lib/pipeline-orchestration.sh — extract run_pipeline, run_dry_run, auto_rebase, cleanup, reasoning trace
Task 5: Create scripts/lib/pipeline-commands.sh — extract CLI subcommands
Task 6: Reduce scripts/sw-pipeline.sh — remove extracted functions, add source lines
Task 7: Write scripts/sw-lib-pipeline-stage-executor-test.sh
Task 8: Write scripts/sw-lib-pipeline-utils-test.sh
Task 9: Write scripts/sw-lib-pipeline-orchestration-test.sh
Task 10: Run full test suite — verify zero regressions
Task 11: Verify sw-pipeline.sh is <1500 lines

Testing Approach

Test Pyramid Breakdown

Unit tests (~40 tests): pipeline-utils (format_duration: 5, parse_coverage: 4, estimate_cost: 3, rotate_event_log: 2), pipeline-stage-executor (classify_error: 8 patterns, run_stage_with_retry: 6 scenarios, self_healing: 4), pipeline-orchestration (run_pipeline: 5 scenarios)
Integration tests (~15 tests): Existing sw-pipeline-test.sh (full pipeline flows, self-healing, dry-run, resume)
E2E tests (~3 tests): Existing pipeline E2E in sw-pipeline-test.sh

Coverage Targets

pipeline-stage-executor.sh: >80% (critical path)
pipeline-utils.sh: >80% (pure functions, easy to test)
pipeline-orchestration.sh: >60% (complex integration, tested via existing E2E)
pipeline-commands.sh: >50% (CLI glue, covered by existing sw-pipeline-test.sh)

Critical Paths to Test

Happy path: pipeline starts → stages execute in order → state updated → pipeline completes
Error: infrastructure failure: stage fails → classified as infra → retried → succeeds
Error: config failure: stage fails → classified as config → no retry → pipeline fails
Error: self-healing: build passes → test fails → re-enters build with error context → passes
Edge: plan artifact exists: plan stage fails but artifact >10 lines → skip retry, advance
Edge: format_duration: 0s, 59s, 60s, 3600s, 3661s boundary values

Risk Analysis

Risk	Impact	Mitigation
Shared globals break after extraction	High	All globals remain in sw-pipeline.sh scope via sourcing. Test immediately after each step.
Source order dependency	Medium	Source utils → worktree → executor → orchestration → commands. Validate with `bash -n`.
Existing tests break from missing copied files	High	sw-pipeline-test.sh copies scripts to temp dir — update setup to copy new lib/ files.
Shell variable scoping in subshells	Medium	self_healing_build_test uses subshells; verify STAGE_STATUSES survives.
`trap` handlers reference moved functions	Medium	cleanup_on_exit stays in sw-pipeline.sh; it references globals not extracted functions.

Definition of Done

sw-pipeline.sh is <1500 lines (target ~400)
scripts/lib/pipeline-stage-executor.sh exists with run_stage_with_retry, classify_error, self_healing_*
scripts/lib/pipeline-orchestration.sh exists with run_pipeline, run_dry_run, auto_rebase
scripts/lib/pipeline-commands.sh exists with pipeline_start/resume/status/abort/list/show
scripts/lib/pipeline-utils.sh exists with format_duration, estimate_pipeline_cost, etc.
scripts/lib/pipeline-worktree.sh exists with worktree setup/cleanup
All modules have include guards (idempotent sourcing)
All modules can be sourced independently without side effects
npm test passes with zero regressions
./scripts/sw-pipeline-test.sh passes with zero regressions
New test files exist for stage-executor, utils, orchestration
New tests achieve >80% coverage on stage-executor and utils modules
No circular dependencies between modules

Pipeline Plan 172

Implementation Plan: sw-pipeline.sh Modular Extraction

Alternatives Considered

Approach A: Extract into 5 focused modules by responsibility (CHOSEN)

Approach B: Extract only stage executor, leave everything else

Approach C: Full rewrite with new architecture

Component Diagram

Interface Contracts

pipeline-stage-executor.sh

pipeline-orchestration.sh

pipeline-commands.sh

pipeline-utils.sh

pipeline-worktree.sh

Data Flow

Error Boundaries

Files to Create

Files to Modify

Test Files to Create

Implementation Steps

Step 1: Create pipeline-utils.sh (standalone, no internal deps)

Step 2: Create pipeline-worktree.sh (standalone)

Step 3: Create pipeline-stage-executor.sh (depends on state, utils)

Step 4: Create pipeline-orchestration.sh (depends on executor, state)

Step 5: Create pipeline-commands.sh (depends on orchestration, state, worktree)

Step 6: Reduce sw-pipeline.sh to orchestration shell

Step 7: Write tests for new modules

Step 8: Verify all existing tests pass

Task Checklist

Testing Approach

Test Pyramid Breakdown

Coverage Targets

Critical Paths to Test

Risk Analysis

Definition of Done

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!