-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 232
I'll provide the comprehensive implementation plan here. This analyzes the root cause, proposes solutions, and structures the work:
Primary Cause - Confirmed:
-
pipeline-tasks.mdis written by intake stage (line 451-462 in pipeline-stages-intake.sh) with metadata:Issue: ${GITHUB_ISSUE} - File is NEVER removed in
initialize_state()(line 461 in pipeline-state.sh) — only clears model-routing.log and .plan-failure-sig.txt - File is NEVER validated in
resume_state()(line 534-610) — blindly restores state without checking metadata - File is NEVER validated in
stage_build()before injection (line 172 in pipeline-stages-build.sh) — injects without checking issue number
Secondary Cause - Suspected:
- TEST_CMD is set to
echo FAIL; exit 1(mentioned in issue as existing in test harness at sw-lib-pipeline-stages-test.sh:251) - This value somehow leaks into pipeline TEST_CMD instead of being
npm test - Causes tests to always fail, triggering unnecessary self-healing cycles
-
VERY HIGH (99%): Task file not cleared/validated on resume — STATE MANAGEMENT BUG
- Evidence: Three independent validation gaps (initialize, resume, inject) all missing
- Fix: Add cleanup + validation at each point
-
HIGH (85%): TEST_CMD environment variable scoping issue
- Evidence: Test harness defines
echo FAIL; exit 1at line 251 - Fix: Isolate test command in local variable, not global
- Evidence: Test harness defines
-
MEDIUM (70%): Resume flow doesn't restore clean environment
- Evidence: resume_state() never validates metadata matches
- Fix: Add issue metadata check before restoring
Approach:
- Add
rm -f "$TASKS_FILE"in initialize_state() - Add issue validation in resume_state() — extract issue from file, compare to GITHUB_ISSUE
- Add safe injection check in stage_build() before using TASKS_FILE
Pros:
- Minimal changes (~15 production lines across 2 files)
- Defense-in-depth: three independent validation points catch stale tasks
- Very low risk: clear, targeted fixes with no refactoring
- Easy to test and verify
Cons:
- Validation logic scattered across 2 files (not centralized)
- Happens at different times in flow
Trade-offs: Simplicity wins over consolidation; perfect for isolated bug
Approach:
- Store commit hash + tree hash in task file
- On resume, verify code hasn't changed drastically
- Selectively inject only relevant tasks
Pros:
- Intelligent: understands which tasks apply to current code
- Future-proof for multi-issue scenarios
Cons:
- Complex (~50+ lines)
- Overkill: pipelines are single-issue
- Hard to maintain and test
Trade-offs: Over-engineering; not chosen
Approach:
- Create
validate_tasks_file()helper - Called from initialize_state(), resume_state(), stage_build()
Pros:
- Single source of truth for validation
- Easier to extend validation later
Cons:
- More code (~30 lines for helper)
- Still adds complexity for simple fix
Trade-offs: Good design, but overkill for localized bug
-
Task 1: Clear tasks on initialize (no deps)
- File:
scripts/lib/pipeline-state.shline 461 - Add:
rm -f "$TASKS_FILE"after existing cleanup - Lines: 1
- Blocks: None (independent)
- File:
-
Task 2: Extract issue from task metadata (dep: Task 1)
- File:
scripts/lib/pipeline-state.shafter line 609 in resume_state() - Parse: Extract
Issue: #NNNfrom TASKS_FILE second header section - Lines: ~6
- Blocks: Task 3
- File:
-
Task 3: Validate issue matches on resume (dep: Task 2)
- File:
scripts/lib/pipeline-state.shafter Task 2 - Logic: If extracted issue ≠ current GITHUB_ISSUE, remove file + warn
- Lines: ~5
- Blocks: Task 4
- File:
-
Task 4: Validate before task injection (dep: Task 3)
- File:
scripts/lib/pipeline-stages-build.shline 172 - Logic: Before injecting TASKS_FILE, verify issue metadata matches $GITHUB_ISSUE
- Lines: ~6
- Blocks: None (defensive check)
- File:
-
Task 5: Investigate and fix TEST_CMD leak (no deps)
- Files: pipeline-stages-build.sh line 447, sw-lib-pipeline-stages-test.sh line 251
- Investigate: How does
echo FAIL; exit 1get into TEST_CMD? - Fix: Isolate test harness value in local variable
- Lines: ~3-5
- Blocks: None (independent)
-
Task 6: Unit test — initialize clears tasks (dep: Task 1)
- File:
scripts/sw-lib-pipeline-stages-test.sh - Test: Create TASKS_FILE, call initialize_state(), verify removed
- Lines: ~15
- File:
-
Task 7: Unit test — resume validates issue (dep: Task 3)
- File:
scripts/sw-lib-pipeline-stages-test.sh - Test: Create tasks with Issue #154, resume with #232, verify removed
- Lines: ~25
- File:
-
Task 8: Unit test — injection validates issue (dep: Task 4)
- File:
scripts/sw-lib-pipeline-stages-test.sh - Test: Create TASKS_FILE with wrong issue, verify not injected
- Lines: ~20
- File:
-
Task 9: Integration test — full resume flow (dep: Tasks 1-4)
- File:
scripts/sw-lib-pipeline-stages-test.sh - Test: Complete scenario with issue change + resume + build
- Lines: ~35
- File:
-
Task 10: Run full test suite (dep: Tasks 1-9)
- Command:
./scripts/sw-pipeline-test.sh - Verify: All 50+ existing tests pass (no regressions)
- Command:
| Risk | Impact | Prob. | Mitigation |
|---|---|---|---|
| Clear valid tasks | Build loses context | Very Low | Issue mismatch only triggers clear; normal flow leaves tasks intact |
| Validation logic breaks | Pipeline crashes | Low | Use safe string ops, ` |
| Tasks still injected | Stale tasks still visible | Very Low | Three independent guards make bypass impossible |
| TEST_CMD leak not fixed | Tests keep failing | Medium | Direct code trace in test harness; verify in pipeline test |
| Existing tests break | Regression | Low | Full test suite before commit; validation only removes mismatched tasks |
| Performance impact | Slow operations | Very Low | Only string comparisons + file unlink; no loops/API calls |
Mitigation Strategy: Each validation point is independent; one broken check doesn't disable others. Tests catch regressions immediately.
-
scripts/lib/pipeline-state.sh (~1200 LOC)
- initialize_state(): +1 line (rm -f)
- resume_state(): +11 lines (parse + validate)
- Impact: 15 lines, very low risk
-
scripts/lib/pipeline-stages-build.sh (~500 LOC)
- stage_build(): +6 lines validation before injection
- Impact: 6 lines, low risk
-
scripts/lib/pipeline-stages.sh OR pipeline-stages-build.sh
- TEST_CMD handling: +3-5 lines (if leak found)
- Impact: Minimal if needed
-
scripts/sw-lib-pipeline-stages-test.sh (~200 LOC)
- Add 4 new test cases: ~95 lines total
- Impact: Test-only, no production risk
Happy Path (No Issue Change):
- Resume pipeline for same issue → tasks remain and are injected ✓
Error Case 1 (Issue Changed):
- Resume for issue #232 with tasks from #154 → stale tasks removed ✓
Error Case 2 (Malformed Task File):
- Resume with corrupted TASKS_FILE → gracefully skip without crashing ✓
Edge Case 1 (Missing Task File):
- Resume when TASKS_FILE doesn't exist → continue normally ✓
Edge Case 2 (Multiple Resumes):
- Resume → modify code → resume again → no accumulation of stale tasks ✓
- Unit tests: 4 focused tests (~70 lines) covering each validation point
- Integration test: 1 scenario test (~35 lines) covering full flow
- Regression tests: Full existing test suite (50+ tests) must pass
-
Code Complete
- Task 1: initialize_state() removes pipeline-tasks.md
- Task 2: resume_state() extracts issue from metadata
- Task 3: resume_state() validates and removes stale tasks
- Task 4: stage_build() validates before injection
- Task 5: TEST_CMD leak investigated and fixed
-
Tests Pass
- Task 6: Initialize clear test passes
- Task 7: Resume validation test passes
- Task 8: Injection validation test passes
- Task 9: Full integration test passes
- Task 10: All 50+ existing tests pass (no regressions)
-
Acceptance Verified
- Old tasks not injected when issue changes
- New tasks ARE injected when issue matches
- Malformed files don't crash pipeline
- Resume flow unchanged for valid cases
- TEST_CMD properly isolated (if issue found)
-
Quality Checks
- Bash 3.2 compatible (no associative arrays, etc.)
- Proper error handling (all checks use safe patterns)
- Logging appropriate (info/warn used correctly)
- No debug code left in
- Task 1 — Clear on initialize (1 line, lowest risk, validates immediately)
- Task 2-3 — Validate on resume (11 lines, core fix)
- Task 4 — Defensive check in build (6 lines, final guard)
- Task 5 — Investigate TEST_CMD (parallel, may be small or larger)
- Task 6-9 — Tests in order of dependencies (15+25+20+35 = 95 lines)
- Task 10 — Full test suite verification (pass/fail decision point)
Estimated Implementation Time: ~60 lines code + 95 lines tests = 155 total lines. Low complexity, high confidence in solution.