Pipeline Plan 172

ezigus edited this page Mar 17, 2026 · 2 revisions

Implementation Plan: sw-pipeline.sh Modular Extraction — Stage Executor and State Manager

Status Assessment

The core extraction work is already complete (commit 1d85a6a). sw-pipeline.sh was reduced from 3,171 → 708 lines (78% reduction). Both target modules exist:

scripts/lib/pipeline-state.sh (612 lines) — state read/write/validate
scripts/lib/pipeline-stage-executor.sh (645 lines) — stage execution with hooks

Remaining work: Test coverage gaps need closing to meet the >80% criterion.

Component Diagram

┌─────────────────────────────────────────────────────┐
│ sw-pipeline.sh (708 lines) │
│ CLI dispatch, sourcing, signal setup │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ commands │ │ orchestration│ │ stages-* │ │
│ │ (836 ln) │→ │ (813 ln) │→ │ (various) │ │
│ └─────────────┘ └──────────────┘ └───────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────┐ │
│ │ pipeline-stage-executor.sh │ ← TARGET MODULE │
│ │ (645 lines) │ │
│ │ • classify_error() │ │
│ │ • run_stage_with_retry() │ │
│ │ • self_healing_build_test()│ │
│ │ • self_healing_review_*() │ │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ pipeline-state.sh │ ← TARGET MODULE │
│ │ (612 lines) │ │
│ │ • save_artifact() │ │
│ │ • get/set_stage_status() │ │
│ │ • mark_stage_complete() │ │
│ │ • mark_stage_failed() │ │
│ │ • write_state() │ │
│ │ • resume_state() │ │
│ │ • initialize_state() │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

Dependency direction: orchestration → executor → state (inward only, no cycles).

Interface Contracts

pipeline-state.sh Public API

// Artifact persistence
save_artifact(name: string, content: string): void // writes to ARTIFACTS_DIR
// Stage status CRUD
get_stage_status(stage_id: string): string // "pending"|"running"|"complete"|"failed"|""
set_stage_status(stage_id: string, status: string): void
// Timing
record_stage_start(stage_id: string): void
record_stage_end(stage_id: string): void
get_stage_timing(stage_id: string): string // formatted "1m30s"
get_stage_timing_seconds(stage_id: string): number // raw seconds, 0 if unknown
get_slowest_stage(): string // stage_id or ""
// Descriptions & progress
get_stage_description(stage_id: string): string
build_stage_progress(): string // "intake:complete plan:running test:pending"
// State transitions (side effects: writes state, emits events, updates GitHub)
update_status(status: string, stage: string): void
mark_stage_complete(stage_id: string): void // Error: event emit failure (non-fatal)
mark_stage_failed(stage_id: string): void // Error: event emit failure (non-fatal)
// Persistence
initialize_state(): void // Resets all state, calls write_state()
write_state(): void // Error: disk space check failure → return 1
resume_state(): void // Error: missing state file → exit 1, missing goal → exit 1
// Validation
verify_stage_artifacts(stage_id: string): boolean // 0=ok, 1=missing artifacts
persist_artifacts(stage: string, ...files: string[]): void // CI-only, non-fatal
// Meta-cognition
record_stage_effectiveness(stage_id: string, outcome: string): void
get_stage_self_awareness_hint(stage_id: string): string
// Logging
log_stage(stage_id: string, message: string): void

pipeline-stage-executor.sh Public API

// Error classification
classify_error(stage_id: string): "infrastructure"|"configuration"|"logic"|"unknown"
// Execution
run_stage_with_retry(stage_id: string): boolean // 0=success, 1=failure
 // Error: configuration errors → immediate failure (no retry)
 // Error: repeated logic errors → immediate failure
// Self-healing loops
self_healing_build_test(): boolean // 0=tests pass, 1=exhausted
 // Error: infrastructure error (simulator not found) → immediate exit
 // Error: convergence stuck (same error 3x) → early exit
 // Error: plateau (no progress 2x) → early exit
self_healing_review_build_test(): boolean // 0=review passes, 1=exhausted

Data Flow

CLI args → parse_args → pipeline_start()
 → initialize_state() [state module]
 → run_pipeline() [orchestration]
 → for each stage:
 → record_stage_start() [state]
 → run_stage_with_retry() [executor]
 → stage_<id>() [stage modules]
 → classify_error() on failure [executor]
 → mark_stage_complete/failed() [state]
 → write_state() → STATE_FILE
 → emit_event() → events.jsonl
 → gh_update_progress() → GitHub

Error Boundaries

Component	Handles	Propagates
state	Disk space (write_state), missing artifacts (verify), CI push failures (persist_artifacts)	Missing state file → exit 1
executor	Infrastructure/config/logic classification, convergence detection, retry decisions	Stage failure → return 1 to orchestration
orchestration	Stage sequencing, gate enforcement, self-healing loop control	Pipeline failure → write final state, exit

Test Coverage Gap Analysis

pipeline-state.sh — Current: ~60% function coverage

Function	Tested	Lines	Priority
get_slowest_stage	No	15	Medium
build_stage_progress	No	20	Medium
update_status	No	5	Low (simple)
write_state	No	68	High (core persistence)
resume_state	No	42	High (crash recovery)
mark_stage_complete	No	94	High (many side effects)
mark_stage_failed	No	68	High (many side effects)

pipeline-stage-executor.sh — Current: ~40% line coverage

Function	Tested	Lines	Priority
self_healing_build_test	No	287	High (core loop)
self_healing_review_build_test	No	69	Medium
run_stage_with_retry (edge cases)	Partial	157	Medium

Alternatives Considered

Alternative 1: Further decomposition (split large functions)

Pros: Smaller units, easier to test
Cons: More files, more source calls, higher complexity for bash
Decision: Rejected. The current module boundaries are clean. Adding tests for existing functions is simpler and lower-risk than restructuring.

Alternative 2: Rewrite tests using bats-core

Pros: Better bash testing framework, TAP output
Cons: New dependency, rewrite all existing tests, learning curve
Decision: Rejected. Existing test-helpers.sh framework works well and is already used by 9 test files. Consistency matters more.

Alternative 3: Test only public API, skip internal functions

Pros: Fewer tests to write
Cons: Won't reach >80% coverage target on complex functions like mark_stage_complete
Decision: Rejected. Key functions have complex side-effect chains that need verification.

Risk Analysis

Risk	Impact	Mitigation
Tests for mark_stage_complete/failed need many stubs	Medium — fragile test setup	Use existing stub pattern from state test file; stub only external calls
self_healing_build_test test could be slow (sleep calls)	Low — test isolation	Override sleep in test, mock stage functions to fail/succeed deterministically
write_state test could corrupt real state	Low	Tests already use TEST_TEMP_DIR isolation
resume_state test needs valid state file format	Low	Generate state file content in test using known-good format

Files to Modify

File	Action
`scripts/sw-lib-pipeline-state-test.sh`	Modify — add tests for write_state, resume_state, get_slowest_stage, build_stage_progress, mark_stage_complete, mark_stage_failed
`scripts/sw-lib-pipeline-stage-executor-test.sh`	Modify — add tests for self_healing_build_test convergence detection, run_stage_with_retry edge cases

No new files needed. No production code changes required.

Implementation Steps

Step 1: Add state module tests (pipeline-state.sh)

Add to sw-lib-pipeline-state-test.sh:

get_slowest_stage: Set up STAGE_TIMINGS with multiple stages, verify correct stage returned. Test empty case returns "".
build_stage_progress: Create minimal PIPELINE_CONFIG JSON, set various stage statuses, verify progress string format.
update_status: Call update_status, verify PIPELINE_STATUS and CURRENT_STAGE are set. Verify write_state was called.
write_state: Set up all state variables, call write_state, read STATE_FILE and verify YAML frontmatter structure contains all expected fields (pipeline, goal, status, issue, stages).
resume_state: Write a known state file, call resume_state (with stubs for gh_init, load_pipeline_config, git), verify all variables are restored correctly. Test error cases: missing file (exit 1), missing goal (exit 1), already-complete pipeline (exit 0).
mark_stage_complete: Stub all external calls (emit_event, gh_*, checkpoint, etc.). Call mark_stage_complete, verify: stage status set to "complete", timing recorded, log entry added, write_state called.
mark_stage_failed: Same pattern as mark_stage_complete but verify "failed" status and error comment format.

Step 2: Add executor module tests (pipeline-stage-executor.sh)

Add to sw-lib-pipeline-stage-executor-test.sh:

run_stage_with_retry — plan artifact skip: Create a plan.md with >10 lines, make stage_plan fail. Verify it returns 0 (skip retry because artifact exists).
run_stage_with_retry — configuration error escalation: Create a log file with "MODULE_NOT_FOUND" error, make stage fail. Verify it returns 1 immediately (no retry).
self_healing_build_test — happy path: Mock stage_build and stage_test to succeed on first try. Verify returns 0.
self_healing_build_test — convergence stuck: Mock stage_test to fail with same error 3 times. Verify early exit with return 1.
self_healing_build_test — plateau detection: Mock stage_test to fail with same failure count for 2 iterations. Verify early exit.

Step 3: Verify all tests pass

Run npm test to verify no regressions.
Run individual test files to verify new tests pass in isolation.

Task Checklist

Task 1: Add get_slowest_stage and build_stage_progress tests to state test file
Task 2: Add write_state test — verify YAML output format and all fields
Task 3: Add resume_state tests — happy path and error cases (missing file, missing goal, complete pipeline)
Task 4: Add mark_stage_complete test with stubbed externals
Task 5: Add mark_stage_failed test with stubbed externals
Task 6: Add update_status test
Task 7: Add run_stage_with_retry edge case tests (plan artifact skip, config error escalation)
Task 8: Add self_healing_build_test happy path test
Task 9: Add self_healing_build_test convergence detection tests (stuck + plateau)
Task 10: Run full test suite — verify all tests pass with no regressions

Testing Approach

Test Pyramid Breakdown

Unit tests (target: ~25 new tests): Test each function in isolation with mocked dependencies
- State module: ~15 new tests (write_state, resume_state, mark_stage_complete/failed, get_slowest_stage, build_stage_progress, update_status)
- Executor module: ~10 new tests (self_healing_build_test paths, run_stage_with_retry edge cases)
Integration tests (existing): The 12 tests in sw-pipeline-test.sh cover end-to-end pipeline flows
E2E tests: Not applicable (shell scripts, no deployment)

Coverage Targets

pipeline-state.sh: 80%+ function coverage (from ~60% → ~95%)
pipeline-stage-executor.sh: 80%+ line coverage (from ~40% → ~80%)
Overall pipeline module coverage: >80%

Critical Paths to Test

Happy path: write_state produces valid YAML → resume_state restores it correctly (round-trip)
Error case 1: resume_state with missing/corrupt state file → exits with error
Error case 2: self_healing_build_test stuck on same error → early convergence exit
Edge case 1: get_slowest_stage with no timing data → returns ""
Edge case 2: mark_stage_complete with all optional integrations absent → still succeeds

Definition of Done

sw-pipeline.sh < 1500 lines (already 708)
pipeline-state.sh exists as separate module (already exists, 612 lines)
pipeline-stage-executor.sh exists as separate module (already exists, 645 lines)
All existing tests pass (npm test green)
New tests added: 20+ additional unit tests across both modules
Function coverage >80% for both modules
No production code changes (test-only additions)
Each module can be sourced independently without side effects (include guards verified)

Pipeline Plan 172

Implementation Plan: sw-pipeline.sh Modular Extraction — Stage Executor and State Manager

Status Assessment

Component Diagram

Interface Contracts

pipeline-state.sh Public API

pipeline-stage-executor.sh Public API

Data Flow

Error Boundaries

Test Coverage Gap Analysis

pipeline-state.sh — Current: ~60% function coverage

pipeline-stage-executor.sh — Current: ~40% line coverage

Alternatives Considered

Alternative 1: Further decomposition (split large functions)

Alternative 2: Rewrite tests using bats-core

Alternative 3: Test only public API, skip internal functions

Risk Analysis

Files to Modify

Implementation Steps

Step 1: Add state module tests (pipeline-state.sh)

Step 2: Add executor module tests (pipeline-stage-executor.sh)

Step 3: Verify all tests pass

Task Checklist

Testing Approach

Test Pyramid Breakdown

Coverage Targets

Critical Paths to Test

Definition of Done

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!