-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Plan 20
It seems I'm being blocked from writing files. Let me present the plan directly instead.
Add an integration test suite that goes beyond the existing 17 mock-based test suites by running the real pipeline with enhanced mock binaries that validate stage ordering, state transitions, and artifact generation — plus a CI workflow with budget enforcement and structured reporting.
Two tiers:
- Smoke tests (no Claude, no GitHub) — validate pipeline machinery with deterministic mocks
-
Live integration tests (requires
CLAUDE_API_KEY+GITHUB_TOKEN) — run a minimal pipeline against real Claude, budget-capped at 1ドル.00
| File | Action | Purpose |
|---|---|---|
scripts/sw-integration-test.sh |
Create | Integration test suite (~500 lines) |
templates/pipelines/integration-test.json |
Create | Minimal 3-stage pipeline template |
.github/workflows/integration-test.yml |
Create | CI workflow (smoke + live jobs) |
.github/workflows/test.yml |
Modify | Add smoke test step after unit tests |
package.json |
Modify | Add test:smoke and test:integration scripts |
Minimal template: intake → build → test only, all auto-gated, model sonnet, max 3 iterations.
Same harness pattern as sw-pipeline-test.sh: set -euo pipefail, ERR trap, PASS/FAIL counters, temp dir isolation, mock binaries on PATH, assertion functions.
Tier 1 — Smoke Tests (10 tests, always run):
-
test_smoke_full_stage_ordering— Run fast template E2E, parsepipeline-state.md, verify stage_progress order,status: success -
test_smoke_state_transitions— Verify state transitionspending→running→completeper stage,current_stageupdates correctly -
test_smoke_artifact_integrity— Verify all expected artifacts exist:intake.json(valid JSON),plan.md,test-results.log,events.jsonlentries -
test_smoke_no_crashes_fast_template— Fast template exits 0, no ERROR lines -
test_smoke_no_crashes_standard_template— Standard template exits 0 -
test_smoke_no_crashes_autonomous_template— Autonomous template exits 0 -
test_smoke_budget_enforcement— Mock cost at 1ドル.01, verify pipeline fails; mock at 0ドル.99, verify it passes -
test_smoke_resume_preserves_artifacts— Fail at test stage, resume, verify prior artifacts preserved -
test_smoke_dry_run_no_side_effects—--dry-runcreates no artifacts, no branches, no events -
test_smoke_stage_timing_recorded— Each completed stage has duration in state file log
Tier 2 — Live Tests (2 tests, require secrets):
-
test_live_trivial_readme_change— Real Claude makes a trivial change, verify diff exists, clean git status, cost < 1ドル.00 -
test_live_pr_creation— Verify PR created with valid URL, cleanup after
Mode selection: --live flag enables Tier 2, --filter <name> for single test.
- smoke job: Matrix (macOS + Ubuntu), runs on every PR and push to main
-
live job: Runs after smoke passes, only on push to main or manual trigger, uses
integration-testenvironment for secret access, 15-minute timeout, uploads artifacts
Add "test:smoke" and "test:integration" scripts.
Add - name: Run integration smoke tests step after existing unit test steps.
- Task 1: Create
templates/pipelines/integration-test.json - Task 2: Create
scripts/sw-integration-test.shscaffold (harness, mocks, assertions) - Task 3: Implement smoke tests 1-5 (ordering, transitions, artifacts, crash tests)
- Task 4: Implement smoke tests 6-10 (budget, resume, dry-run, timing)
- Task 5: Implement live tests (README change, PR creation)
- Task 6: Create
.github/workflows/integration-test.yml - Task 7: Update
package.jsonwith new scripts - Task 8: Update
.github/workflows/test.ymlwith smoke step - Task 9: Add CI summary reporting (
$GITHUB_STEP_SUMMARYmarkdown table) - Task 10: Run smoke tests locally and fix failures
- Task 11: Verify CLAUDE.md AUTO sections update
-
Local:
bash scripts/sw-integration-test.sh— all 10 smoke tests pass -
Filter:
--filter test_smoke_budget_enforcementfor debugging individual tests - CI: Push branch → verify workflow triggers and summary reports
-
Live: Manual dispatch with
run_live: trueor merge to main - Cross-platform: macOS + Ubuntu matrix
-
npm run test:smokepasses (exit 0) -
npm run test:integrationruns full suite when secrets available - CI runs smoke tests on every PR to main
- CI runs live tests on push to main (regression)
- Budget cap: live tests abort if cost > 1ドル.00
- CI summary: pass/fail per test in
$GITHUB_STEP_SUMMARY - Existing 17 test suites still pass
- Follows conventions:
set -euo pipefail, bash 3.2, shipwright color theme