Pipeline Plan 189

Seth Ford edited this page Apr 5, 2026 · 2 revisions

Plan stage complete. Key findings:

The decomposition described in issue #189 was already completed on 2026年03月01日 (commit 17bd421). Current state:

sw-pipeline.sh: 266 lines (orchestration only)
16 stage functions across 6 lib files (4,489 lines)
Total pipeline lib ecosystem: 18+ files, 11,538 lines

Remaining gaps the build stage should address:

Test coverage: Only 6/16 stage functions have unit tests. 10 stages need tests added to sw-lib-pipeline-stages-test.sh
Verification: Run existing tests to confirm baseline passes
Documentation: CLAUDE.md Shared Libraries section is stale (only lists compat.sh)

The plan is written to .claude/pipeline-artifacts/plan.md and pipeline state advanced to design. ed by the Strategic Intelligence Agent from stale hygiene data (line count from before decomposition). The decomposition was already completed in a prior pipeline. Confidence: 95% 2. Issue was filed correctly but decomposition was done independently. Confidence: 5% 3. The issue refers to a different decomposition scope. Confidence: <1%

Fix Strategy

The core decomposition is already done. Remaining work focuses on closing gaps in the acceptance criteria that the original decomposition left behind:

Acceptance Criterion	Status	Gap
New lib/pipeline-stages.sh with stage functions	DONE	Functions use `stage_` naming (not `run__stage`), consistent with codebase convention
sw-pipeline.sh reduced to <1500 lines	DONE	266 lines
All existing tests pass	NEEDS VERIFICATION	Must run `sw-lib-pipeline-stages-test.sh` and `sw-pipeline-test.sh`
New unit tests for each extracted stage function	GAP	Only 6/16 stages tested. Missing: spec_generation, design, test_first, spec_verification, compound_quality, audit, merge, deploy, validate, monitor
Stage execution latency unchanged	NEEDS VERIFICATION	Baseline exists in memory (build 8506s, test 1365s)
Documentation updated with architecture diagram	GAP	CLAUDE.md Shared Libraries section only lists `compat.sh`, missing all pipeline-stages-*.sh

Component Diagram

┌─────────────────────────────────────┐
│ sw-pipeline.sh (266L) │ Orchestration Layer
│ CLI parsing → subcommand dispatch │
│ Sources all lib/ modules │
└──────────────┬──────────────────────┘
 │ sources
 ┌──────────┴──────────┐
 │ scripts/lib/ │ Library Layer
 │ │
 │ pipeline-cli.sh │ CLI argument parsing, help
 │ pipeline-commands.sh │ start/resume/status/abort
 │ pipeline-execution.sh│ run_pipeline(), retry, self-healing
 │ pipeline-state.sh │ State read/write, progress tracking
 │ pipeline-stages.sh │ Base: show_stage_preview, helpers
 │ │
 │ ┌── Stage Modules ─┐ │
 │ │ stages-intake.sh │ │ stage_intake, stage_spec_generation,
 │ │ │ │ stage_plan, stage_design
 │ │ stages-build.sh │ │ stage_test_first, stage_build, stage_test
 │ │ stages-review.sh │ │ stage_review, stage_spec_verification,
 │ │ │ │ stage_compound_quality, stage_audit
 │ │ stages-delivery.sh│ │ stage_pr, stage_merge, stage_deploy
 │ │ stages-monitor.sh │ │ stage_validate, stage_monitor
 │ └───────────────────┘ │
 │ │
 │ pipeline-detection.sh │ Task type, language detection
 │ pipeline-intelligence*│ AI-powered analysis, scoring
 │ pipeline-quality-*.sh │ Quality gates, bash compat checks
 │ pipeline-github.sh │ GitHub API helpers
 │ pipeline-util.sh │ Shared utilities
 └───────────────────────┘

Interface Contracts

All stage functions follow this contract:

# Input: Global variables (set by sw-pipeline.sh before sourcing)
# ARTIFACTS_DIR — path to .claude/pipeline-artifacts/
# PROJECT_ROOT — repo root
# ISSUE_NUMBER — GitHub issue number
# GOAL — pipeline goal text
# MODEL — Claude model to use
# TEST_CMD — test command
# NO_GITHUB — "true" to skip GitHub API calls
# PIPELINE_CONFIG — path to pipeline template JSON
# BASE_BRANCH — base branch name
# GIT_BRANCH — feature branch name
# Output: Return code 0 on success, non-zero on failure
# Side effects: Write artifacts to $ARTIFACTS_DIR/*.{json,md,log}
# Error contract: Errors propagate via set -e; stage_error() for structured errors
stage_<name>() {
 # Read config from PIPELINE_CONFIG
 # Read prior artifacts from ARTIFACTS_DIR
 # Execute stage logic
 # Write output artifacts to ARTIFACTS_DIR
 # Return 0 on success
}

Data Flow

stage_intake → intake.json, task-type
 ↓
stage_spec_generation → spec.json
 ↓
stage_plan → plan.md, dod.md, tasks.md
 ↓
stage_design → design.md
 ↓
stage_build → git commits (via sw-loop)
 ↓
stage_test → test-results.log, coverage data
 ↓
stage_review → review.md, review-diff.patch
 ↓
stage_spec_verification → spec-verification.md
 ↓
stage_compound_quality → compound-quality.md
 ↓
stage_audit → audit-trail.jsonl
 ↓
stage_pr → GitHub PR (PR number in artifacts)
 ↓
stage_merge → merged commit
 ↓
stage_deploy → deployment.json
 ↓
stage_validate → validation results
 ↓
stage_monitor → monitoring status

Error Boundaries

Stage-level: Each stage function runs under set -euo pipefail. Errors bubble to run_stage_with_retry() in pipeline-execution.sh.
Retry layer: run_stage_with_retry() classifies errors (transient vs permanent) and retries transient failures.
Self-healing: self_healing_build_test() handles build→test feedback loops, re-entering build with error context.
Pipeline-level: run_pipeline() catches stage failures, updates state, and either retries or aborts.

Files to Modify

Existing (modify)

scripts/sw-lib-pipeline-stages-test.sh — Expand with tests for all 16 stage functions

Existing (no changes needed)

scripts/sw-pipeline.sh — Already 266 lines, pure orchestration
scripts/lib/pipeline-stages-*.sh — Already extracted and functional

Implementation Steps

Step 1: Verify existing tests pass

Run sw-lib-pipeline-stages-test.sh and sw-pipeline-test.sh to establish baseline.

Step 2: Expand unit tests for untested stages

Add tests to sw-lib-pipeline-stages-test.sh for the 10 missing stages:

scripts/lib/pipeline-stages-intake.sh:

stage_spec_generation() — Test spec JSON output format, edge case with no issue body
stage_design() — Test design.md creation, ADR format

scripts/lib/pipeline-stages-build.sh:

stage_test_first() — Test TDD mode detection, test file generation

scripts/lib/pipeline-stages-review.sh:

stage_spec_verification() — Test spec compliance checking
stage_compound_quality() — Test adversarial + E2E + DoD checklist
stage_audit() — Test audit trail JSONL format

scripts/lib/pipeline-stages-delivery.sh:

stage_merge() — Test merge logic, branch protection checks
stage_deploy() — Test deployment tracking, environment handling

scripts/lib/pipeline-stages-monitor.sh:

stage_validate() — Test health check logic, issue closing
stage_monitor() — Test monitoring loop, auto-rollback trigger

Each test follows the existing pattern: mock binaries, set up $ARTIFACTS_DIR with prerequisite artifacts, call the stage function, assert outputs.

Step 3: Run full test verification

Run both the expanded unit tests and the E2E pipeline tests to confirm no regressions.

Task Checklist

Task 1: Verify sw-pipeline.sh is already decomposed (266 lines, orchestration only)
Task 2: Verify stage functions exist in scripts/lib/pipeline-stages-*.sh (16 functions across 6 files)
Task 3: Audit test coverage — identify untested stages (10 of 16 missing)
Task 4: Run existing tests to establish baseline
Task 5: Add unit tests for stage_spec_generation() and stage_design()
Task 6: Add unit tests for stage_test_first()
Task 7: Add unit tests for stage_spec_verification(), stage_compound_quality(), stage_audit()
Task 8: Add unit tests for stage_merge() and stage_deploy()
Task 9: Add unit tests for stage_validate() and stage_monitor()
Task 10: Run full test suite to verify no regressions
Task 11: Verify documentation in CLAUDE.md reflects current architecture

Testing Approach

Test Pyramid Breakdown

Unit tests: 20+ new assertions covering 10 untested stage functions (happy path + error cases)
Integration tests: Existing sw-pipeline-test.sh (58 tests) covers stage integration
E2E tests: Existing sw-e2e-smoke-test.sh (19 tests) covers full pipeline flow

Coverage Targets

All 16 stage functions have at least one unit test each
Error paths tested for stages that interact with external tools (gh, claude)
Mock-based isolation: no real Claude/GitHub calls in unit tests

Critical Paths to Test

Happy path: Each stage reads prerequisite artifacts and produces expected output
Error case 1: Stage fails when prerequisite artifacts are missing
Error case 2: Stage handles NO_GITHUB=true gracefully (no gh calls)
Edge case 1: Empty GOAL/ISSUE_NUMBER
Edge case 2: Stage called with DRY_RUN=true

Definition of Done

scripts/lib/pipeline-stages-*.sh contains all 16 stage functions
sw-pipeline.sh is <1500 lines (currently 266)
sw-lib-pipeline-stages-test.sh has tests for all 16 stage functions
sw-lib-pipeline-stages-test.sh passes (0 failures)
sw-pipeline-test.sh passes (0 failures)
No existing tests modified (all pass without changes)

Design Decisions

ADR-1: Accept existing `stage_` naming instead of `run__stage`

Context: Issue asks for run_intake_stage, run_plan_stage naming. Codebase uses stage_intake, stage_plan.
Decision: Keep existing stage_* naming convention.
Alternatives: Rename all 16 functions to run_*_stage — rejected because it would break all existing callers (pipeline-execution.sh, pipeline-test.sh, etc.) with zero functional benefit.
Consequences: Issue acceptance criteria language doesn't match exact function names, but the intent is fully satisfied.

ADR-2: Expand existing test file instead of creating new ones

Context: Could create per-module test files (e.g., sw-lib-pipeline-stages-intake-test.sh).
Decision: Expand existing sw-lib-pipeline-stages-test.sh.
Alternatives: Split into 5 test files (one per module) — rejected because the existing test infrastructure (setup_test_env, mock binaries) is already configured in the single file, and the test suite is still manageable at ~500-600 lines.
Consequences: Single test file grows but remains under 700 lines. If it grows beyond that in future, split then.

Threat Model (STRIDE)

Not applicable — This is a pure refactoring task with no user-facing input, authentication, authorization, or network changes. Stage functions only operate on local files and invoke CLI tools with controlled arguments. No new attack surface is introduced.

Security Checklist

No secrets in code (verified: stages use $GITHUB_TOKEN from environment, never hardcoded)
No new input validation points (stages read from controlled artifact files)
GitHub API calls gated by $NO_GITHUB check (existing pattern preserved)

Failure Mode Analysis

FM-1: New tests interact with real GitHub/Claude APIs

Risk: Tests accidentally call gh or claude without mocks, causing flaky CI or API rate limiting. Mitigation: All tests run with NO_GITHUB=true, mock binaries in $PATH, and GH_AVAILABLE=true (uses mock). The existing test infrastructure handles this — follow the same pattern.

FM-2: Stage function signatures change between plan and build

Risk: The 10+ commits since decomposition may have changed function signatures or artifact formats. Mitigation: Read current source before writing each test (Step 2). Use Explore agent to verify function signatures at build time.

FM-3: Test environment state leakage between test cases

Risk: One test's artifacts interfere with another test, causing false passes/fails. Mitigation: Follow existing pattern: setup_test_env creates fresh $TEST_TEMP_DIR, cleanup_test_env removes it. Each test section can reset $ARTIFACTS_DIR contents.

Addressing FM-2 (most critical)

Each test will be written by first reading the current implementation of the stage function being tested, ensuring the test matches the actual interface — not the interface described in the issue.

Risk Analysis

Risk	Impact	Likelihood	Mitigation
Tests call real APIs	CI failures, rate limits	Low (mock infra exists)	Follow existing mock patterns
Stage signatures changed	Tests written wrong	Medium	Read source before each test
Test state leakage	False passes	Low	Fresh TEST_TEMP_DIR per run
CLAUDE.md auto-sync conflict	Doc merge issues	Low	AUTO sections handle regeneration

Pipeline Plan 189

Fix Strategy

Component Diagram

Interface Contracts

Data Flow

Error Boundaries

Files to Modify

Existing (modify)

Existing (no changes needed)

Implementation Steps

Step 1: Verify existing tests pass

Step 2: Expand unit tests for untested stages

Step 3: Run full test verification

Task Checklist

Testing Approach

Test Pyramid Breakdown

Coverage Targets

Critical Paths to Test

Definition of Done

Design Decisions

ADR-1: Accept existing stage_* naming instead of run_*_stage

ADR-2: Expand existing test file instead of creating new ones

Threat Model (STRIDE)

Security Checklist

Failure Mode Analysis

FM-1: New tests interact with real GitHub/Claude APIs

FM-2: Stage function signatures change between plan and build

FM-3: Test environment state leakage between test cases

Addressing FM-2 (most critical)

Risk Analysis

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ADR-1: Accept existing `stage_` naming instead of `run__stage`