Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 189

Seth Ford edited this page Apr 5, 2026 · 2 revisions

Plan stage complete. Key findings:

The decomposition described in issue #189 was already completed on 2026年03月01日 (commit 17bd421). Current state:

  • sw-pipeline.sh: 266 lines (orchestration only)
  • 16 stage functions across 6 lib files (4,489 lines)
  • Total pipeline lib ecosystem: 18+ files, 11,538 lines

Remaining gaps the build stage should address:

  1. Test coverage: Only 6/16 stage functions have unit tests. 10 stages need tests added to sw-lib-pipeline-stages-test.sh
  2. Verification: Run existing tests to confirm baseline passes
  3. Documentation: CLAUDE.md Shared Libraries section is stale (only lists compat.sh)

The plan is written to .claude/pipeline-artifacts/plan.md and pipeline state advanced to design. ed by the Strategic Intelligence Agent from stale hygiene data (line count from before decomposition). The decomposition was already completed in a prior pipeline. Confidence: 95% 2. Issue was filed correctly but decomposition was done independently. Confidence: 5% 3. The issue refers to a different decomposition scope. Confidence: <1%

Fix Strategy

The core decomposition is already done. Remaining work focuses on closing gaps in the acceptance criteria that the original decomposition left behind:

Acceptance Criterion Status Gap
New lib/pipeline-stages.sh with stage functions DONE Functions use stage_* naming (not run_*_stage), consistent with codebase convention
sw-pipeline.sh reduced to <1500 lines DONE 266 lines
All existing tests pass NEEDS VERIFICATION Must run sw-lib-pipeline-stages-test.sh and sw-pipeline-test.sh
New unit tests for each extracted stage function GAP Only 6/16 stages tested. Missing: spec_generation, design, test_first, spec_verification, compound_quality, audit, merge, deploy, validate, monitor
Stage execution latency unchanged NEEDS VERIFICATION Baseline exists in memory (build 8506s, test 1365s)
Documentation updated with architecture diagram GAP CLAUDE.md Shared Libraries section only lists compat.sh, missing all pipeline-stages-*.sh

Component Diagram

┌─────────────────────────────────────┐
│ sw-pipeline.sh (266L) │ Orchestration Layer
│ CLI parsing → subcommand dispatch │
│ Sources all lib/ modules │
└──────────────┬──────────────────────┘
 │ sources
 ┌──────────┴──────────┐
 │ scripts/lib/ │ Library Layer
 │ │
 │ pipeline-cli.sh │ CLI argument parsing, help
 │ pipeline-commands.sh │ start/resume/status/abort
 │ pipeline-execution.sh│ run_pipeline(), retry, self-healing
 │ pipeline-state.sh │ State read/write, progress tracking
 │ pipeline-stages.sh │ Base: show_stage_preview, helpers
 │ │
 │ ┌── Stage Modules ─┐ │
 │ │ stages-intake.sh │ │ stage_intake, stage_spec_generation,
 │ │ │ │ stage_plan, stage_design
 │ │ stages-build.sh │ │ stage_test_first, stage_build, stage_test
 │ │ stages-review.sh │ │ stage_review, stage_spec_verification,
 │ │ │ │ stage_compound_quality, stage_audit
 │ │ stages-delivery.sh│ │ stage_pr, stage_merge, stage_deploy
 │ │ stages-monitor.sh │ │ stage_validate, stage_monitor
 │ └───────────────────┘ │
 │ │
 │ pipeline-detection.sh │ Task type, language detection
 │ pipeline-intelligence*│ AI-powered analysis, scoring
 │ pipeline-quality-*.sh │ Quality gates, bash compat checks
 │ pipeline-github.sh │ GitHub API helpers
 │ pipeline-util.sh │ Shared utilities
 └───────────────────────┘

Interface Contracts

All stage functions follow this contract:

# Input: Global variables (set by sw-pipeline.sh before sourcing)
# ARTIFACTS_DIR — path to .claude/pipeline-artifacts/
# PROJECT_ROOT — repo root
# ISSUE_NUMBER — GitHub issue number
# GOAL — pipeline goal text
# MODEL — Claude model to use
# TEST_CMD — test command
# NO_GITHUB — "true" to skip GitHub API calls
# PIPELINE_CONFIG — path to pipeline template JSON
# BASE_BRANCH — base branch name
# GIT_BRANCH — feature branch name
# Output: Return code 0 on success, non-zero on failure
# Side effects: Write artifacts to $ARTIFACTS_DIR/*.{json,md,log}
# Error contract: Errors propagate via set -e; stage_error() for structured errors
stage_<name>() {
 # Read config from PIPELINE_CONFIG
 # Read prior artifacts from ARTIFACTS_DIR
 # Execute stage logic
 # Write output artifacts to ARTIFACTS_DIR
 # Return 0 on success
}

Data Flow

stage_intake → intake.json, task-type
 ↓
stage_spec_generation → spec.json
 ↓
stage_plan → plan.md, dod.md, tasks.md
 ↓
stage_design → design.md
 ↓
stage_build → git commits (via sw-loop)
 ↓
stage_test → test-results.log, coverage data
 ↓
stage_review → review.md, review-diff.patch
 ↓
stage_spec_verification → spec-verification.md
 ↓
stage_compound_quality → compound-quality.md
 ↓
stage_audit → audit-trail.jsonl
 ↓
stage_pr → GitHub PR (PR number in artifacts)
 ↓
stage_merge → merged commit
 ↓
stage_deploy → deployment.json
 ↓
stage_validate → validation results
 ↓
stage_monitor → monitoring status

Error Boundaries

  • Stage-level: Each stage function runs under set -euo pipefail. Errors bubble to run_stage_with_retry() in pipeline-execution.sh.
  • Retry layer: run_stage_with_retry() classifies errors (transient vs permanent) and retries transient failures.
  • Self-healing: self_healing_build_test() handles build→test feedback loops, re-entering build with error context.
  • Pipeline-level: run_pipeline() catches stage failures, updates state, and either retries or aborts.

Files to Modify

Existing (modify)

  1. scripts/sw-lib-pipeline-stages-test.sh — Expand with tests for all 16 stage functions

Existing (no changes needed)

  • scripts/sw-pipeline.sh — Already 266 lines, pure orchestration
  • scripts/lib/pipeline-stages-*.sh — Already extracted and functional

Implementation Steps

Step 1: Verify existing tests pass

Run sw-lib-pipeline-stages-test.sh and sw-pipeline-test.sh to establish baseline.

Step 2: Expand unit tests for untested stages

Add tests to sw-lib-pipeline-stages-test.sh for the 10 missing stages:

scripts/lib/pipeline-stages-intake.sh:

  • stage_spec_generation() — Test spec JSON output format, edge case with no issue body
  • stage_design() — Test design.md creation, ADR format

scripts/lib/pipeline-stages-build.sh:

  • stage_test_first() — Test TDD mode detection, test file generation

scripts/lib/pipeline-stages-review.sh:

  • stage_spec_verification() — Test spec compliance checking
  • stage_compound_quality() — Test adversarial + E2E + DoD checklist
  • stage_audit() — Test audit trail JSONL format

scripts/lib/pipeline-stages-delivery.sh:

  • stage_merge() — Test merge logic, branch protection checks
  • stage_deploy() — Test deployment tracking, environment handling

scripts/lib/pipeline-stages-monitor.sh:

  • stage_validate() — Test health check logic, issue closing
  • stage_monitor() — Test monitoring loop, auto-rollback trigger

Each test follows the existing pattern: mock binaries, set up $ARTIFACTS_DIR with prerequisite artifacts, call the stage function, assert outputs.

Step 3: Run full test verification

Run both the expanded unit tests and the E2E pipeline tests to confirm no regressions.

Task Checklist

  • Task 1: Verify sw-pipeline.sh is already decomposed (266 lines, orchestration only)
  • Task 2: Verify stage functions exist in scripts/lib/pipeline-stages-*.sh (16 functions across 6 files)
  • Task 3: Audit test coverage — identify untested stages (10 of 16 missing)
  • Task 4: Run existing tests to establish baseline
  • Task 5: Add unit tests for stage_spec_generation() and stage_design()
  • Task 6: Add unit tests for stage_test_first()
  • Task 7: Add unit tests for stage_spec_verification(), stage_compound_quality(), stage_audit()
  • Task 8: Add unit tests for stage_merge() and stage_deploy()
  • Task 9: Add unit tests for stage_validate() and stage_monitor()
  • Task 10: Run full test suite to verify no regressions
  • Task 11: Verify documentation in CLAUDE.md reflects current architecture

Testing Approach

Test Pyramid Breakdown

  • Unit tests: 20+ new assertions covering 10 untested stage functions (happy path + error cases)
  • Integration tests: Existing sw-pipeline-test.sh (58 tests) covers stage integration
  • E2E tests: Existing sw-e2e-smoke-test.sh (19 tests) covers full pipeline flow

Coverage Targets

  • All 16 stage functions have at least one unit test each
  • Error paths tested for stages that interact with external tools (gh, claude)
  • Mock-based isolation: no real Claude/GitHub calls in unit tests

Critical Paths to Test

  • Happy path: Each stage reads prerequisite artifacts and produces expected output
  • Error case 1: Stage fails when prerequisite artifacts are missing
  • Error case 2: Stage handles NO_GITHUB=true gracefully (no gh calls)
  • Edge case 1: Empty GOAL/ISSUE_NUMBER
  • Edge case 2: Stage called with DRY_RUN=true

Definition of Done

  • scripts/lib/pipeline-stages-*.sh contains all 16 stage functions
  • sw-pipeline.sh is <1500 lines (currently 266)
  • sw-lib-pipeline-stages-test.sh has tests for all 16 stage functions
  • sw-lib-pipeline-stages-test.sh passes (0 failures)
  • sw-pipeline-test.sh passes (0 failures)
  • No existing tests modified (all pass without changes)

Design Decisions

ADR-1: Accept existing stage_* naming instead of run_*_stage

  • Context: Issue asks for run_intake_stage, run_plan_stage naming. Codebase uses stage_intake, stage_plan.
  • Decision: Keep existing stage_* naming convention.
  • Alternatives: Rename all 16 functions to run_*_stage — rejected because it would break all existing callers (pipeline-execution.sh, pipeline-test.sh, etc.) with zero functional benefit.
  • Consequences: Issue acceptance criteria language doesn't match exact function names, but the intent is fully satisfied.

ADR-2: Expand existing test file instead of creating new ones

  • Context: Could create per-module test files (e.g., sw-lib-pipeline-stages-intake-test.sh).
  • Decision: Expand existing sw-lib-pipeline-stages-test.sh.
  • Alternatives: Split into 5 test files (one per module) — rejected because the existing test infrastructure (setup_test_env, mock binaries) is already configured in the single file, and the test suite is still manageable at ~500-600 lines.
  • Consequences: Single test file grows but remains under 700 lines. If it grows beyond that in future, split then.

Threat Model (STRIDE)

Not applicable — This is a pure refactoring task with no user-facing input, authentication, authorization, or network changes. Stage functions only operate on local files and invoke CLI tools with controlled arguments. No new attack surface is introduced.

Security Checklist

  • No secrets in code (verified: stages use $GITHUB_TOKEN from environment, never hardcoded)
  • No new input validation points (stages read from controlled artifact files)
  • GitHub API calls gated by $NO_GITHUB check (existing pattern preserved)

Failure Mode Analysis

FM-1: New tests interact with real GitHub/Claude APIs

Risk: Tests accidentally call gh or claude without mocks, causing flaky CI or API rate limiting. Mitigation: All tests run with NO_GITHUB=true, mock binaries in $PATH, and GH_AVAILABLE=true (uses mock). The existing test infrastructure handles this — follow the same pattern.

FM-2: Stage function signatures change between plan and build

Risk: The 10+ commits since decomposition may have changed function signatures or artifact formats. Mitigation: Read current source before writing each test (Step 2). Use Explore agent to verify function signatures at build time.

FM-3: Test environment state leakage between test cases

Risk: One test's artifacts interfere with another test, causing false passes/fails. Mitigation: Follow existing pattern: setup_test_env creates fresh $TEST_TEMP_DIR, cleanup_test_env removes it. Each test section can reset $ARTIFACTS_DIR contents.

Addressing FM-2 (most critical)

Each test will be written by first reading the current implementation of the stage function being tested, ensuring the test matches the actual interface — not the interface described in the issue.

Risk Analysis

Risk Impact Likelihood Mitigation
Tests call real APIs CI failures, rate limits Low (mock infra exists) Follow existing mock patterns
Stage signatures changed Tests written wrong Medium Read source before each test
Test state leakage False passes Low Fresh TEST_TEMP_DIR per run
CLAUDE.md auto-sync conflict Doc merge issues Low AUTO sections handle regeneration

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /