-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Plan 189
Plan stage complete. Key findings:
The decomposition described in issue #189 was already completed on 2026年03月01日 (commit 17bd421). Current state:
-
sw-pipeline.sh: 266 lines (orchestration only) - 16 stage functions across 6 lib files (4,489 lines)
- Total pipeline lib ecosystem: 18+ files, 11,538 lines
Remaining gaps the build stage should address:
-
Test coverage: Only 6/16 stage functions have unit tests. 10 stages need tests added to
sw-lib-pipeline-stages-test.sh - Verification: Run existing tests to confirm baseline passes
-
Documentation: CLAUDE.md Shared Libraries section is stale (only lists
compat.sh)
The plan is written to .claude/pipeline-artifacts/plan.md and pipeline state advanced to design.
ed by the Strategic Intelligence Agent from stale hygiene data (line count from before decomposition). The decomposition was already completed in a prior pipeline. Confidence: 95%
2. Issue was filed correctly but decomposition was done independently. Confidence: 5%
3. The issue refers to a different decomposition scope. Confidence: <1%
The core decomposition is already done. Remaining work focuses on closing gaps in the acceptance criteria that the original decomposition left behind:
| Acceptance Criterion | Status | Gap |
|---|---|---|
| New lib/pipeline-stages.sh with stage functions | DONE | Functions use stage_* naming (not run_*_stage), consistent with codebase convention |
| sw-pipeline.sh reduced to <1500 lines | DONE | 266 lines |
| All existing tests pass | NEEDS VERIFICATION | Must run sw-lib-pipeline-stages-test.sh and sw-pipeline-test.sh
|
| New unit tests for each extracted stage function | GAP | Only 6/16 stages tested. Missing: spec_generation, design, test_first, spec_verification, compound_quality, audit, merge, deploy, validate, monitor |
| Stage execution latency unchanged | NEEDS VERIFICATION | Baseline exists in memory (build 8506s, test 1365s) |
| Documentation updated with architecture diagram | GAP | CLAUDE.md Shared Libraries section only lists compat.sh, missing all pipeline-stages-*.sh |
┌─────────────────────────────────────┐
│ sw-pipeline.sh (266L) │ Orchestration Layer
│ CLI parsing → subcommand dispatch │
│ Sources all lib/ modules │
└──────────────┬──────────────────────┘
│ sources
┌──────────┴──────────┐
│ scripts/lib/ │ Library Layer
│ │
│ pipeline-cli.sh │ CLI argument parsing, help
│ pipeline-commands.sh │ start/resume/status/abort
│ pipeline-execution.sh│ run_pipeline(), retry, self-healing
│ pipeline-state.sh │ State read/write, progress tracking
│ pipeline-stages.sh │ Base: show_stage_preview, helpers
│ │
│ ┌── Stage Modules ─┐ │
│ │ stages-intake.sh │ │ stage_intake, stage_spec_generation,
│ │ │ │ stage_plan, stage_design
│ │ stages-build.sh │ │ stage_test_first, stage_build, stage_test
│ │ stages-review.sh │ │ stage_review, stage_spec_verification,
│ │ │ │ stage_compound_quality, stage_audit
│ │ stages-delivery.sh│ │ stage_pr, stage_merge, stage_deploy
│ │ stages-monitor.sh │ │ stage_validate, stage_monitor
│ └───────────────────┘ │
│ │
│ pipeline-detection.sh │ Task type, language detection
│ pipeline-intelligence*│ AI-powered analysis, scoring
│ pipeline-quality-*.sh │ Quality gates, bash compat checks
│ pipeline-github.sh │ GitHub API helpers
│ pipeline-util.sh │ Shared utilities
└───────────────────────┘
All stage functions follow this contract:
# Input: Global variables (set by sw-pipeline.sh before sourcing) # ARTIFACTS_DIR — path to .claude/pipeline-artifacts/ # PROJECT_ROOT — repo root # ISSUE_NUMBER — GitHub issue number # GOAL — pipeline goal text # MODEL — Claude model to use # TEST_CMD — test command # NO_GITHUB — "true" to skip GitHub API calls # PIPELINE_CONFIG — path to pipeline template JSON # BASE_BRANCH — base branch name # GIT_BRANCH — feature branch name # Output: Return code 0 on success, non-zero on failure # Side effects: Write artifacts to $ARTIFACTS_DIR/*.{json,md,log} # Error contract: Errors propagate via set -e; stage_error() for structured errors stage_<name>() { # Read config from PIPELINE_CONFIG # Read prior artifacts from ARTIFACTS_DIR # Execute stage logic # Write output artifacts to ARTIFACTS_DIR # Return 0 on success }
stage_intake → intake.json, task-type
↓
stage_spec_generation → spec.json
↓
stage_plan → plan.md, dod.md, tasks.md
↓
stage_design → design.md
↓
stage_build → git commits (via sw-loop)
↓
stage_test → test-results.log, coverage data
↓
stage_review → review.md, review-diff.patch
↓
stage_spec_verification → spec-verification.md
↓
stage_compound_quality → compound-quality.md
↓
stage_audit → audit-trail.jsonl
↓
stage_pr → GitHub PR (PR number in artifacts)
↓
stage_merge → merged commit
↓
stage_deploy → deployment.json
↓
stage_validate → validation results
↓
stage_monitor → monitoring status
-
Stage-level: Each stage function runs under
set -euo pipefail. Errors bubble torun_stage_with_retry()inpipeline-execution.sh. -
Retry layer:
run_stage_with_retry()classifies errors (transient vs permanent) and retries transient failures. -
Self-healing:
self_healing_build_test()handles build→test feedback loops, re-entering build with error context. -
Pipeline-level:
run_pipeline()catches stage failures, updates state, and either retries or aborts.
-
scripts/sw-lib-pipeline-stages-test.sh— Expand with tests for all 16 stage functions
-
scripts/sw-pipeline.sh— Already 266 lines, pure orchestration -
scripts/lib/pipeline-stages-*.sh— Already extracted and functional
Run sw-lib-pipeline-stages-test.sh and sw-pipeline-test.sh to establish baseline.
Add tests to sw-lib-pipeline-stages-test.sh for the 10 missing stages:
scripts/lib/pipeline-stages-intake.sh:
-
stage_spec_generation()— Test spec JSON output format, edge case with no issue body -
stage_design()— Test design.md creation, ADR format
scripts/lib/pipeline-stages-build.sh:
-
stage_test_first()— Test TDD mode detection, test file generation
scripts/lib/pipeline-stages-review.sh:
-
stage_spec_verification()— Test spec compliance checking -
stage_compound_quality()— Test adversarial + E2E + DoD checklist -
stage_audit()— Test audit trail JSONL format
scripts/lib/pipeline-stages-delivery.sh:
-
stage_merge()— Test merge logic, branch protection checks -
stage_deploy()— Test deployment tracking, environment handling
scripts/lib/pipeline-stages-monitor.sh:
-
stage_validate()— Test health check logic, issue closing -
stage_monitor()— Test monitoring loop, auto-rollback trigger
Each test follows the existing pattern: mock binaries, set up $ARTIFACTS_DIR with prerequisite artifacts, call the stage function, assert outputs.
Run both the expanded unit tests and the E2E pipeline tests to confirm no regressions.
- Task 1: Verify sw-pipeline.sh is already decomposed (266 lines, orchestration only)
- Task 2: Verify stage functions exist in scripts/lib/pipeline-stages-*.sh (16 functions across 6 files)
- Task 3: Audit test coverage — identify untested stages (10 of 16 missing)
- Task 4: Run existing tests to establish baseline
- Task 5: Add unit tests for
stage_spec_generation()andstage_design() - Task 6: Add unit tests for
stage_test_first() - Task 7: Add unit tests for
stage_spec_verification(),stage_compound_quality(),stage_audit() - Task 8: Add unit tests for
stage_merge()andstage_deploy() - Task 9: Add unit tests for
stage_validate()andstage_monitor() - Task 10: Run full test suite to verify no regressions
- Task 11: Verify documentation in CLAUDE.md reflects current architecture
- Unit tests: 20+ new assertions covering 10 untested stage functions (happy path + error cases)
-
Integration tests: Existing
sw-pipeline-test.sh(58 tests) covers stage integration -
E2E tests: Existing
sw-e2e-smoke-test.sh(19 tests) covers full pipeline flow
- All 16 stage functions have at least one unit test each
- Error paths tested for stages that interact with external tools (gh, claude)
- Mock-based isolation: no real Claude/GitHub calls in unit tests
- Happy path: Each stage reads prerequisite artifacts and produces expected output
- Error case 1: Stage fails when prerequisite artifacts are missing
- Error case 2: Stage handles NO_GITHUB=true gracefully (no gh calls)
- Edge case 1: Empty GOAL/ISSUE_NUMBER
- Edge case 2: Stage called with DRY_RUN=true
-
scripts/lib/pipeline-stages-*.shcontains all 16 stage functions -
sw-pipeline.shis <1500 lines (currently 266) -
sw-lib-pipeline-stages-test.shhas tests for all 16 stage functions -
sw-lib-pipeline-stages-test.shpasses (0 failures) -
sw-pipeline-test.shpasses (0 failures) - No existing tests modified (all pass without changes)
-
Context: Issue asks for
run_intake_stage,run_plan_stagenaming. Codebase usesstage_intake,stage_plan. -
Decision: Keep existing
stage_*naming convention. -
Alternatives: Rename all 16 functions to
run_*_stage— rejected because it would break all existing callers (pipeline-execution.sh, pipeline-test.sh, etc.) with zero functional benefit. - Consequences: Issue acceptance criteria language doesn't match exact function names, but the intent is fully satisfied.
-
Context: Could create per-module test files (e.g.,
sw-lib-pipeline-stages-intake-test.sh). -
Decision: Expand existing
sw-lib-pipeline-stages-test.sh. - Alternatives: Split into 5 test files (one per module) — rejected because the existing test infrastructure (setup_test_env, mock binaries) is already configured in the single file, and the test suite is still manageable at ~500-600 lines.
- Consequences: Single test file grows but remains under 700 lines. If it grows beyond that in future, split then.
Not applicable — This is a pure refactoring task with no user-facing input, authentication, authorization, or network changes. Stage functions only operate on local files and invoke CLI tools with controlled arguments. No new attack surface is introduced.
- No secrets in code (verified: stages use
$GITHUB_TOKENfrom environment, never hardcoded) - No new input validation points (stages read from controlled artifact files)
- GitHub API calls gated by
$NO_GITHUBcheck (existing pattern preserved)
Risk: Tests accidentally call gh or claude without mocks, causing flaky CI or API rate limiting.
Mitigation: All tests run with NO_GITHUB=true, mock binaries in $PATH, and GH_AVAILABLE=true (uses mock). The existing test infrastructure handles this — follow the same pattern.
Risk: The 10+ commits since decomposition may have changed function signatures or artifact formats. Mitigation: Read current source before writing each test (Step 2). Use Explore agent to verify function signatures at build time.
Risk: One test's artifacts interfere with another test, causing false passes/fails.
Mitigation: Follow existing pattern: setup_test_env creates fresh $TEST_TEMP_DIR, cleanup_test_env removes it. Each test section can reset $ARTIFACTS_DIR contents.
Each test will be written by first reading the current implementation of the stage function being tested, ensuring the test matches the actual interface — not the interface described in the issue.
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Tests call real APIs | CI failures, rate limits | Low (mock infra exists) | Follow existing mock patterns |
| Stage signatures changed | Tests written wrong | Medium | Read source before each test |
| Test state leakage | False passes | Low | Fresh TEST_TEMP_DIR per run |
| CLAUDE.md auto-sync conflict | Doc merge issues | Low | AUTO sections handle regeneration |