Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 172

ezigus edited this page Mar 17, 2026 · 2 revisions

Implementation Plan: sw-pipeline.sh Modular Extraction — Stage Executor and State Manager

Status Assessment

The core extraction work is already complete (commit 1d85a6a). sw-pipeline.sh was reduced from 3,171 → 708 lines (78% reduction). Both target modules exist:

  • scripts/lib/pipeline-state.sh (612 lines) — state read/write/validate
  • scripts/lib/pipeline-stage-executor.sh (645 lines) — stage execution with hooks

Remaining work: Test coverage gaps need closing to meet the >80% criterion.


Component Diagram

┌─────────────────────────────────────────────────────┐
│ sw-pipeline.sh (708 lines) │
│ CLI dispatch, sourcing, signal setup │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ commands │ │ orchestration│ │ stages-* │ │
│ │ (836 ln) │→ │ (813 ln) │→ │ (various) │ │
│ └─────────────┘ └──────────────┘ └───────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────┐ │
│ │ pipeline-stage-executor.sh │ ← TARGET MODULE │
│ │ (645 lines) │ │
│ │ • classify_error() │ │
│ │ • run_stage_with_retry() │ │
│ │ • self_healing_build_test()│ │
│ │ • self_healing_review_*() │ │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ pipeline-state.sh │ ← TARGET MODULE │
│ │ (612 lines) │ │
│ │ • save_artifact() │ │
│ │ • get/set_stage_status() │ │
│ │ • mark_stage_complete() │ │
│ │ • mark_stage_failed() │ │
│ │ • write_state() │ │
│ │ • resume_state() │ │
│ │ • initialize_state() │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

Dependency direction: orchestration → executor → state (inward only, no cycles).


Interface Contracts

pipeline-state.sh Public API

// Artifact persistence
save_artifact(name: string, content: string): void // writes to ARTIFACTS_DIR
// Stage status CRUD
get_stage_status(stage_id: string): string // "pending"|"running"|"complete"|"failed"|""
set_stage_status(stage_id: string, status: string): void
// Timing
record_stage_start(stage_id: string): void
record_stage_end(stage_id: string): void
get_stage_timing(stage_id: string): string // formatted "1m30s"
get_stage_timing_seconds(stage_id: string): number // raw seconds, 0 if unknown
get_slowest_stage(): string // stage_id or ""
// Descriptions & progress
get_stage_description(stage_id: string): string
build_stage_progress(): string // "intake:complete plan:running test:pending"
// State transitions (side effects: writes state, emits events, updates GitHub)
update_status(status: string, stage: string): void
mark_stage_complete(stage_id: string): void // Error: event emit failure (non-fatal)
mark_stage_failed(stage_id: string): void // Error: event emit failure (non-fatal)
// Persistence
initialize_state(): void // Resets all state, calls write_state()
write_state(): void // Error: disk space check failure → return 1
resume_state(): void // Error: missing state file → exit 1, missing goal → exit 1
// Validation
verify_stage_artifacts(stage_id: string): boolean // 0=ok, 1=missing artifacts
persist_artifacts(stage: string, ...files: string[]): void // CI-only, non-fatal
// Meta-cognition
record_stage_effectiveness(stage_id: string, outcome: string): void
get_stage_self_awareness_hint(stage_id: string): string
// Logging
log_stage(stage_id: string, message: string): void

pipeline-stage-executor.sh Public API

// Error classification
classify_error(stage_id: string): "infrastructure"|"configuration"|"logic"|"unknown"
// Execution
run_stage_with_retry(stage_id: string): boolean // 0=success, 1=failure
 // Error: configuration errors → immediate failure (no retry)
 // Error: repeated logic errors → immediate failure
// Self-healing loops
self_healing_build_test(): boolean // 0=tests pass, 1=exhausted
 // Error: infrastructure error (simulator not found) → immediate exit
 // Error: convergence stuck (same error 3x) → early exit
 // Error: plateau (no progress 2x) → early exit
self_healing_review_build_test(): boolean // 0=review passes, 1=exhausted

Data Flow

CLI args → parse_args → pipeline_start()
 → initialize_state() [state module]
 → run_pipeline() [orchestration]
 → for each stage:
 → record_stage_start() [state]
 → run_stage_with_retry() [executor]
 → stage_<id>() [stage modules]
 → classify_error() on failure [executor]
 → mark_stage_complete/failed() [state]
 → write_state() → STATE_FILE
 → emit_event() → events.jsonl
 → gh_update_progress() → GitHub

Error Boundaries

Component Handles Propagates
state Disk space (write_state), missing artifacts (verify), CI push failures (persist_artifacts) Missing state file → exit 1
executor Infrastructure/config/logic classification, convergence detection, retry decisions Stage failure → return 1 to orchestration
orchestration Stage sequencing, gate enforcement, self-healing loop control Pipeline failure → write final state, exit

Test Coverage Gap Analysis

pipeline-state.sh — Current: ~60% function coverage

Function Tested Lines Priority
get_slowest_stage No 15 Medium
build_stage_progress No 20 Medium
update_status No 5 Low (simple)
write_state No 68 High (core persistence)
resume_state No 42 High (crash recovery)
mark_stage_complete No 94 High (many side effects)
mark_stage_failed No 68 High (many side effects)

pipeline-stage-executor.sh — Current: ~40% line coverage

Function Tested Lines Priority
self_healing_build_test No 287 High (core loop)
self_healing_review_build_test No 69 Medium
run_stage_with_retry (edge cases) Partial 157 Medium

Alternatives Considered

Alternative 1: Further decomposition (split large functions)

  • Pros: Smaller units, easier to test
  • Cons: More files, more source calls, higher complexity for bash
  • Decision: Rejected. The current module boundaries are clean. Adding tests for existing functions is simpler and lower-risk than restructuring.

Alternative 2: Rewrite tests using bats-core

  • Pros: Better bash testing framework, TAP output
  • Cons: New dependency, rewrite all existing tests, learning curve
  • Decision: Rejected. Existing test-helpers.sh framework works well and is already used by 9 test files. Consistency matters more.

Alternative 3: Test only public API, skip internal functions

  • Pros: Fewer tests to write
  • Cons: Won't reach >80% coverage target on complex functions like mark_stage_complete
  • Decision: Rejected. Key functions have complex side-effect chains that need verification.

Risk Analysis

Risk Impact Mitigation
Tests for mark_stage_complete/failed need many stubs Medium — fragile test setup Use existing stub pattern from state test file; stub only external calls
self_healing_build_test test could be slow (sleep calls) Low — test isolation Override sleep in test, mock stage functions to fail/succeed deterministically
write_state test could corrupt real state Low Tests already use TEST_TEMP_DIR isolation
resume_state test needs valid state file format Low Generate state file content in test using known-good format

Files to Modify

File Action
scripts/sw-lib-pipeline-state-test.sh Modify — add tests for write_state, resume_state, get_slowest_stage, build_stage_progress, mark_stage_complete, mark_stage_failed
scripts/sw-lib-pipeline-stage-executor-test.sh Modify — add tests for self_healing_build_test convergence detection, run_stage_with_retry edge cases

No new files needed. No production code changes required.


Implementation Steps

Step 1: Add state module tests (pipeline-state.sh)

Add to sw-lib-pipeline-state-test.sh:

  1. get_slowest_stage: Set up STAGE_TIMINGS with multiple stages, verify correct stage returned. Test empty case returns "".

  2. build_stage_progress: Create minimal PIPELINE_CONFIG JSON, set various stage statuses, verify progress string format.

  3. update_status: Call update_status, verify PIPELINE_STATUS and CURRENT_STAGE are set. Verify write_state was called.

  4. write_state: Set up all state variables, call write_state, read STATE_FILE and verify YAML frontmatter structure contains all expected fields (pipeline, goal, status, issue, stages).

  5. resume_state: Write a known state file, call resume_state (with stubs for gh_init, load_pipeline_config, git), verify all variables are restored correctly. Test error cases: missing file (exit 1), missing goal (exit 1), already-complete pipeline (exit 0).

  6. mark_stage_complete: Stub all external calls (emit_event, gh_*, checkpoint, etc.). Call mark_stage_complete, verify: stage status set to "complete", timing recorded, log entry added, write_state called.

  7. mark_stage_failed: Same pattern as mark_stage_complete but verify "failed" status and error comment format.

Step 2: Add executor module tests (pipeline-stage-executor.sh)

Add to sw-lib-pipeline-stage-executor-test.sh:

  1. run_stage_with_retry — plan artifact skip: Create a plan.md with >10 lines, make stage_plan fail. Verify it returns 0 (skip retry because artifact exists).

  2. run_stage_with_retry — configuration error escalation: Create a log file with "MODULE_NOT_FOUND" error, make stage fail. Verify it returns 1 immediately (no retry).

  3. self_healing_build_test — happy path: Mock stage_build and stage_test to succeed on first try. Verify returns 0.

  4. self_healing_build_test — convergence stuck: Mock stage_test to fail with same error 3 times. Verify early exit with return 1.

  5. self_healing_build_test — plateau detection: Mock stage_test to fail with same failure count for 2 iterations. Verify early exit.

Step 3: Verify all tests pass

  1. Run npm test to verify no regressions.
  2. Run individual test files to verify new tests pass in isolation.

Task Checklist

  • Task 1: Add get_slowest_stage and build_stage_progress tests to state test file
  • Task 2: Add write_state test — verify YAML output format and all fields
  • Task 3: Add resume_state tests — happy path and error cases (missing file, missing goal, complete pipeline)
  • Task 4: Add mark_stage_complete test with stubbed externals
  • Task 5: Add mark_stage_failed test with stubbed externals
  • Task 6: Add update_status test
  • Task 7: Add run_stage_with_retry edge case tests (plan artifact skip, config error escalation)
  • Task 8: Add self_healing_build_test happy path test
  • Task 9: Add self_healing_build_test convergence detection tests (stuck + plateau)
  • Task 10: Run full test suite — verify all tests pass with no regressions

Testing Approach

Test Pyramid Breakdown

  • Unit tests (target: ~25 new tests): Test each function in isolation with mocked dependencies
    • State module: ~15 new tests (write_state, resume_state, mark_stage_complete/failed, get_slowest_stage, build_stage_progress, update_status)
    • Executor module: ~10 new tests (self_healing_build_test paths, run_stage_with_retry edge cases)
  • Integration tests (existing): The 12 tests in sw-pipeline-test.sh cover end-to-end pipeline flows
  • E2E tests: Not applicable (shell scripts, no deployment)

Coverage Targets

  • pipeline-state.sh: 80%+ function coverage (from ~60% → ~95%)
  • pipeline-stage-executor.sh: 80%+ line coverage (from ~40% → ~80%)
  • Overall pipeline module coverage: >80%

Critical Paths to Test

  • Happy path: write_state produces valid YAML → resume_state restores it correctly (round-trip)
  • Error case 1: resume_state with missing/corrupt state file → exits with error
  • Error case 2: self_healing_build_test stuck on same error → early convergence exit
  • Edge case 1: get_slowest_stage with no timing data → returns ""
  • Edge case 2: mark_stage_complete with all optional integrations absent → still succeeds

Definition of Done

  • sw-pipeline.sh < 1500 lines (already 708)
  • pipeline-state.sh exists as separate module (already exists, 612 lines)
  • pipeline-stage-executor.sh exists as separate module (already exists, 645 lines)
  • All existing tests pass (npm test green)
  • New tests added: 20+ additional unit tests across both modules
  • Function coverage >80% for both modules
  • No production code changes (test-only additions)
  • Each module can be sourced independently without side effects (include guards verified)

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /