Pipeline Design 232

ezigus edited this page Mar 30, 2026 · 3 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Key decisions in the ADR:

Defense-in-depth with 3 independent validation gates — initialize_state (unconditional rm), resume_state (issue-match check), stage_build (injection gate). Each layer independently prevents stale task injection.
Test-only approach — production fixes are already committed across 5 PRs. This phase adds a dedicated regression test suite (scripts/sw-pipeline-tasks-md-test.sh) with 9 tests.
Conservative error handling — malformed task files (missing - Issue: header) are treated as stale and removed rather than injected.
Rejected alternatives: single cleanup point (doesn't protect resume path), centralized manager (over-engineering), completion-time cleanup (fragile to crashes). the issue number from the task file's metadata header.

Constraints: Production fixes are already committed across 5 PRs (1a64c9d0, 4a2f315d, 8bfeb411, 70a8aaeb, a730e469). This design focuses exclusively on verification and test coverage — no production code changes.

Root Cause Hypothesis

Most likely — Missing validation on resume path (CONFIRMED FIXED): resume_state() previously did not validate the task file's issue against the current GITHUB_ISSUE. Stale files survived resume and were injected by stage_build(). Fixed in commits referenced above.
Issue number format inconsistency: GITHUB_ISSUE could contain #232 or 232, while the task file might store either format. If normalization differed across paths, comparisons would fail silently. Fixed by consistent tr -d '#' | xargs normalization in all three paths.
Malformed metadata header: A task file missing the - Issue: line would bypass all issue-based checks and get injected regardless. Fixed by treating extraction failure as grounds for file removal.

Evidence Gathered

All three cleanup paths are confirmed present with correct logic:

pipeline-state.sh:463: [[ -n "${TASKS_FILE:-}" ]] && rm -f "$TASKS_FILE" — unconditional cleanup on fresh start
pipeline-state.sh:607-620: Issue mismatch detection via extract_issue_from_tasks_file(), with rm -f on mismatch or malformed file
pipeline-stages-build.sh:172-189: Defense-in-depth gate — only injects tasks when tasks_issue == current_issue, removes file otherwise
helpers.sh:429-435: extract_issue_from_tasks_file() uses grep -m1 -i "^-\{0,1\} *Issue:", strips #, trims whitespace, returns exit 1 on failure
Normalization: All three paths use identical echo "${GITHUB_ISSUE:-}" | tr -d '#' | xargs
Existing tests: sw-lib-pipeline-stages-test.sh:385-655 already covers initialize cleanup, resume mismatch, stage_build injection gate, and normalization variants

Decision

Test-only verification approach — validate the existing three-layer defense without modifying production code.

The fix uses a defense-in-depth pattern with three independent validation gates:

initialize_state() -> unconditional rm (layer 1: fresh start)
resume_state() -> issue-match check + rm on mismatch (layer 2: resume)
stage_build() -> issue-match check + rm on mismatch (layer 3: injection point)

Each layer independently prevents stale task injection. Even if one layer is bypassed (e.g., resume skipped), the next layer catches the mismatch.

Issue normalization is centralized in extract_issue_from_tasks_file() for the file side, and uses tr -d '#' | xargs inline for GITHUB_ISSUE at each comparison point. This is intentionally not DRY'd into a helper — the inline normalization is 6 characters and keeping it visible at each comparison point makes the validation logic self-documenting.

Error handling: Malformed files (missing - Issue: header) are treated as stale and removed. This is the correct conservative choice — a task file that can't be attributed to an issue should never be injected.

Fix Strategy

No new production fix needed. The existing commits implement the correct three-layer defense. The strategy is:

Extend test coverage in sw-lib-pipeline-stages-test.sh (tests already exist at lines 385-655)
Add a dedicated test file scripts/sw-pipeline-tasks-md-test.sh for focused regression coverage
Cover the exact reproduction scenario (issue #154 tasks leaking into #232 pipeline)
Cover edge cases: unset TASKS_FILE, missing file, 5+ metadata format variations

This differs from a "rewrite" approach because the fixes are already correct — we just need confidence through comprehensive test coverage.

Alternatives Considered

Single cleanup point (only in initialize_state) — Pros: simpler, one place to maintain / Cons: doesn't protect resume path, which is the exact bug scenario. Resume skips initialize_state entirely, so stale files would survive.
Centralized task file manager class/module — Pros: DRY, single responsibility / Cons: over-engineering for a flat file with 3 consumers. The current inline checks are 5-8 lines each and self-documenting. Adding an abstraction layer would obscure the validation logic at the point where it matters most.
Delete task file at pipeline completion instead of checking on resume — Pros: prevents accumulation / Cons: doesn't handle crashes, kill -9, or context exhaustion where cleanup code never runs. The current approach (validate on read) is resilient to incomplete shutdown.

Implementation Plan

Files to create: scripts/sw-pipeline-tasks-md-test.sh — dedicated regression test suite with 9 tests covering the exact bug scenario, metadata format variations, edge cases (unset variable, concurrent access, corrupted file, missing file)
Files to modify: None (production code already fixed)
Dependencies: None new
Risk areas:
- Test isolation: each test must use $TMPDIR-based temp files, never live .claude/ directory
- Sourcing production functions in test context requires careful mock setup (write_state, warn, etc.)
- extract_issue_from_tasks_file regex relies on grep -m1 -i — test must verify case-insensitive matching

Validation Criteria

Bug reproduction test passes: issue #154 tasks file is deleted when resuming for issue #232
All 3 cleanup paths verified: initialize_state (unconditional rm), resume_state (issue mismatch rm), stage_build (injection gate)
extract_issue_from_tasks_file handles 5+ metadata formats: - Issue: #232, - Issue: 232, Issue: #232, - Issue: #232, and returns error for missing header
Issue normalization consistent: all paths produce bare number (no #, no whitespace) before comparison
Edge cases pass: unset TASKS_FILE, missing file, malformed file, concurrent initialization
Existing test suite (sw-lib-pipeline-stages-test.sh) passes with zero regressions
No production code changes in the final diff

Verification Plan

Run existing tests: bash scripts/sw-lib-pipeline-stages-test.sh — confirms no regressions
Run new test suite: bash scripts/sw-pipeline-tasks-md-test.sh — confirms all 9 scenarios pass
Run full suite: npm test — confirms no cross-contamination
Manual verification: inspect git diff to confirm zero production file changes

Pipeline Design 232

Root Cause Hypothesis

Evidence Gathered

Decision

Fix Strategy

Alternatives Considered

Implementation Plan

Validation Criteria

Verification Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally