Pipeline Plan 232

ezigus edited this page Mar 27, 2026 · 2 revisions

I'll provide the comprehensive implementation plan here. This analyzes the root cause, proposes solutions, and structures the work:

Implementation Plan: Fix Stale Tasks Injected on Pipeline Resume (Issue #232)

Root Cause Analysis (Systematic Debugging)

Evidence Gathered

Primary Cause - Confirmed:

pipeline-tasks.md is written by intake stage (line 451-462 in pipeline-stages-intake.sh) with metadata: Issue: ${GITHUB_ISSUE}
File is NEVER removed in initialize_state() (line 461 in pipeline-state.sh) — only clears model-routing.log and .plan-failure-sig.txt
File is NEVER validated in resume_state() (line 534-610) — blindly restores state without checking metadata
File is NEVER validated in stage_build() before injection (line 172 in pipeline-stages-build.sh) — injects without checking issue number

Secondary Cause - Suspected:

TEST_CMD is set to echo FAIL; exit 1 (mentioned in issue as existing in test harness at sw-lib-pipeline-stages-test.sh:251)
This value somehow leaks into pipeline TEST_CMD instead of being npm test
Causes tests to always fail, triggering unnecessary self-healing cycles

Root Cause Hypothesis (Ranked)

VERY HIGH (99%): Task file not cleared/validated on resume — STATE MANAGEMENT BUG
- Evidence: Three independent validation gaps (initialize, resume, inject) all missing
- Fix: Add cleanup + validation at each point
HIGH (85%): TEST_CMD environment variable scoping issue
- Evidence: Test harness defines echo FAIL; exit 1 at line 251
- Fix: Isolate test command in local variable, not global
MEDIUM (70%): Resume flow doesn't restore clean environment
- Evidence: resume_state() never validates metadata matches
- Fix: Add issue metadata check before restoring

Alternatives Considered

Alternative A: "Clear on Initialize + Validate on Resume" ✓ CHOSEN

Approach:

Add rm -f "$TASKS_FILE" in initialize_state()
Add issue validation in resume_state() — extract issue from file, compare to GITHUB_ISSUE
Add safe injection check in stage_build() before using TASKS_FILE

Pros:

Minimal changes (~15 production lines across 2 files)
Defense-in-depth: three independent validation points catch stale tasks
Very low risk: clear, targeted fixes with no refactoring
Easy to test and verify

Cons:

Validation logic scattered across 2 files (not centralized)
Happens at different times in flow

Trade-offs: Simplicity wins over consolidation; perfect for isolated bug

Alternative B: "Metadata-Scoped Task Validation"

Approach:

Store commit hash + tree hash in task file
On resume, verify code hasn't changed drastically
Selectively inject only relevant tasks

Pros:

Intelligent: understands which tasks apply to current code
Future-proof for multi-issue scenarios

Cons:

Complex (~50+ lines)
Overkill: pipelines are single-issue
Hard to maintain and test

Trade-offs: Over-engineering; not chosen

Alternative C: "Helper Function Consolidation"

Approach:

Create validate_tasks_file() helper
Called from initialize_state(), resume_state(), stage_build()

Pros:

Single source of truth for validation
Easier to extend validation later

Cons:

More code (~30 lines for helper)
Still adds complexity for simple fix

Trade-offs: Good design, but overkill for localized bug

Task Decomposition (with Dependencies)

Production Code Tasks

Task 1: Clear tasks on initialize (no deps)
- File: scripts/lib/pipeline-state.sh line 461
- Add: rm -f "$TASKS_FILE" after existing cleanup
- Lines: 1
- Blocks: None (independent)
Task 2: Extract issue from task metadata (dep: Task 1)
- File: scripts/lib/pipeline-state.sh after line 609 in resume_state()
- Parse: Extract Issue: #NNN from TASKS_FILE second header section
- Lines: ~6
- Blocks: Task 3
Task 3: Validate issue matches on resume (dep: Task 2)
- File: scripts/lib/pipeline-state.sh after Task 2
- Logic: If extracted issue ≠ current GITHUB_ISSUE, remove file + warn
- Lines: ~5
- Blocks: Task 4
Task 4: Validate before task injection (dep: Task 3)
- File: scripts/lib/pipeline-stages-build.sh line 172
- Logic: Before injecting TASKS_FILE, verify issue metadata matches $GITHUB_ISSUE
- Lines: ~6
- Blocks: None (defensive check)
Task 5: Investigate and fix TEST_CMD leak (no deps)
- Files: pipeline-stages-build.sh line 447, sw-lib-pipeline-stages-test.sh line 251
- Investigate: How does echo FAIL; exit 1 get into TEST_CMD?
- Fix: Isolate test harness value in local variable
- Lines: ~3-5
- Blocks: None (independent)

Test Tasks

Task 6: Unit test — initialize clears tasks (dep: Task 1)
- File: scripts/sw-lib-pipeline-stages-test.sh
- Test: Create TASKS_FILE, call initialize_state(), verify removed
- Lines: ~15
Task 7: Unit test — resume validates issue (dep: Task 3)
- File: scripts/sw-lib-pipeline-stages-test.sh
- Test: Create tasks with Issue #154, resume with #232, verify removed
- Lines: ~25
Task 8: Unit test — injection validates issue (dep: Task 4)
- File: scripts/sw-lib-pipeline-stages-test.sh
- Test: Create TASKS_FILE with wrong issue, verify not injected
- Lines: ~20
Task 9: Integration test — full resume flow (dep: Tasks 1-4)
- File: scripts/sw-lib-pipeline-stages-test.sh
- Test: Complete scenario with issue change + resume + build
- Lines: ~35
Task 10: Run full test suite (dep: Tasks 1-9)
- Command: ./scripts/sw-pipeline-test.sh
- Verify: All 50+ existing tests pass (no regressions)

Risk Analysis

Risk	Impact	Prob.	Mitigation
Clear valid tasks	Build loses context	Very Low	Issue mismatch only triggers clear; normal flow leaves tasks intact
Validation logic breaks	Pipeline crashes	Low	Use safe string ops, `
Tasks still injected	Stale tasks still visible	Very Low	Three independent guards make bypass impossible
TEST_CMD leak not fixed	Tests keep failing	Medium	Direct code trace in test harness; verify in pipeline test
Existing tests break	Regression	Low	Full test suite before commit; validation only removes mismatched tasks
Performance impact	Slow operations	Very Low	Only string comparisons + file unlink; no loops/API calls

Mitigation Strategy: Each validation point is independent; one broken check doesn't disable others. Tests catch regressions immediately.

Files to Modify

scripts/lib/pipeline-state.sh (~1200 LOC)
- initialize_state(): +1 line (rm -f)
- resume_state(): +11 lines (parse + validate)
- Impact: 15 lines, very low risk
scripts/lib/pipeline-stages-build.sh (~500 LOC)
- stage_build(): +6 lines validation before injection
- Impact: 6 lines, low risk
scripts/lib/pipeline-stages.sh OR pipeline-stages-build.sh
- TEST_CMD handling: +3-5 lines (if leak found)
- Impact: Minimal if needed
scripts/sw-lib-pipeline-stages-test.sh (~200 LOC)
- Add 4 new test cases: ~95 lines total
- Impact: Test-only, no production risk

Testing Approach

Critical Paths to Test

Happy Path (No Issue Change):

Resume pipeline for same issue → tasks remain and are injected ✓

Error Case 1 (Issue Changed):

Resume for issue #232 with tasks from #154 → stale tasks removed ✓

Error Case 2 (Malformed Task File):

Resume with corrupted TASKS_FILE → gracefully skip without crashing ✓

Edge Case 1 (Missing Task File):

Resume when TASKS_FILE doesn't exist → continue normally ✓

Edge Case 2 (Multiple Resumes):

Resume → modify code → resume again → no accumulation of stale tasks ✓

Test Coverage Targets

Unit tests: 4 focused tests (~70 lines) covering each validation point
Integration test: 1 scenario test (~35 lines) covering full flow
Regression tests: Full existing test suite (50+ tests) must pass

Definition of Done

Code Complete
- Task 1: initialize_state() removes pipeline-tasks.md
- Task 2: resume_state() extracts issue from metadata
- Task 3: resume_state() validates and removes stale tasks
- Task 4: stage_build() validates before injection
- Task 5: TEST_CMD leak investigated and fixed
Tests Pass
- Task 6: Initialize clear test passes
- Task 7: Resume validation test passes
- Task 8: Injection validation test passes
- Task 9: Full integration test passes
- Task 10: All 50+ existing tests pass (no regressions)
Acceptance Verified
- Old tasks not injected when issue changes
- New tasks ARE injected when issue matches
- Malformed files don't crash pipeline
- Resume flow unchanged for valid cases
- TEST_CMD properly isolated (if issue found)
Quality Checks
- Bash 3.2 compatible (no associative arrays, etc.)
- Proper error handling (all checks use safe patterns)
- Logging appropriate (info/warn used correctly)
- No debug code left in

Execution Order

Task 1 — Clear on initialize (1 line, lowest risk, validates immediately)
Task 2-3 — Validate on resume (11 lines, core fix)
Task 4 — Defensive check in build (6 lines, final guard)
Task 5 — Investigate TEST_CMD (parallel, may be small or larger)
Task 6-9 — Tests in order of dependencies (15+25+20+35 = 95 lines)
Task 10 — Full test suite verification (pass/fail decision point)

Estimated Implementation Time: ~60 lines code + 95 lines tests = 155 total lines. Low complexity, high confidence in solution.

Pipeline Plan 232

Implementation Plan: Fix Stale Tasks Injected on Pipeline Resume (Issue #232)

Root Cause Analysis (Systematic Debugging)

Evidence Gathered

Root Cause Hypothesis (Ranked)

Alternatives Considered

Alternative A: "Clear on Initialize + Validate on Resume" ✓ CHOSEN

Alternative B: "Metadata-Scoped Task Validation"

Alternative C: "Helper Function Consolidation"

Task Decomposition (with Dependencies)

Production Code Tasks

Test Tasks

Risk Analysis

Files to Modify

Testing Approach

Critical Paths to Test

Test Coverage Targets

Definition of Done

Execution Order

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally