Pipeline Plan 189

ezigus edited this page Mar 18, 2026 · 2 revisions

Plan written to .claude/pipeline-artifacts/plan.md.

Summary: The plan creates a new scripts/lib/context-error.sh module that bridges the existing memory system (memory_ranked_search) with error output paths. It integrates into 3 failure handlers:

mark_stage_failed() — saves context-error artifact + enriches GitHub comment
daemon_on_failure() — adds context-aware section to failure GitHub comment
compose_prompt() — injects historical failure context into loop iteration prompts

Error reports have 4 sections: What Failed (stage/iteration/goal), Why (filtered log snippets), Similar Past Issues (from memory), Suggested Actions (2-3 concrete steps from history + category defaults).

A feedback loop tracks suggestion effectiveness via suggestions.jsonl, marking them resolved when pipelines succeed.

10 tasks, 2 new files, 4 modified files, 10 test cases, zero Bash 3.2 compat issues. r type and history 4. Include relevant log snippets with context (not just raw stack traces) 5. Format error output with clear sections: What Failed / Why / Similar Past Issues / Suggested Actions 6. Track whether suggested actions lead to successful resolution (feedback loop)

Design Alternatives

Approach A: New library + integration hooks (CHOSEN)

Create scripts/lib/context-error.sh with formatting and memory query functions
Call from existing failure handlers (mark_stage_failed, daemon_on_failure, loop error path)
Store suggestions in suggestions.jsonl for feedback tracking
Pros: Minimal blast radius, reusable, testable in isolation
Cons: Requires touching 3 existing files

Approach B: Enhance error-actionability.sh directly

Extend the existing scoring/enhancement system to include memory queries
Pros: Fewer new files
Cons: Conflates scoring (fast, side-effect-free) with memory queries (I/O heavy), violates SRP

Approach C: Post-processing pipeline

Add a post-failure hook that reads error-log.jsonl and generates reports
Pros: Zero changes to existing failure paths
Cons: Delayed feedback, can't inject context into retry prompts, misses the real-time output

Choice: Approach A — cleanest separation, minimal blast radius, can be tested independently.

Risk Assessment

Memory query performance: memory_ranked_search does file I/O + jq processing. Mitigation: Already used in loop prompts; add a timeout guard.
Breaking existing error output: GitHub comments use specific formatting. Mitigation: Context-aware output is additional — existing messages unchanged. New content added to existing comment bodies.
Empty memory state: First-time users have no failures.json. Mitigation: Graceful fallback to generic suggestions based on error category.

Dependency Analysis

Depends on: helpers.sh (output helpers, emit_event), sw-memory.sh (memory_ranked_search), error-actionability.sh (error categorization)
Changed by this PR: pipeline-state.sh (mark_stage_failed), daemon-failure.sh (daemon_on_failure), loop-iteration.sh (error-summary composition)
No circular dependency risks: context-error.sh sources helpers.sh and calls sw-memory.sh functions

Files to Modify

New Files

scripts/lib/context-error.sh — Core module: rich error formatting, memory query, suggestion engine, feedback tracking
scripts/sw-lib-context-error-test.sh — Test suite for the new module

Modified Files

scripts/lib/pipeline-state.sh — Integrate context-aware error report into mark_stage_failed() (GitHub comment + artifact)
scripts/lib/daemon-failure.sh — Integrate context-aware error report into daemon_on_failure() (GitHub comment)
scripts/lib/loop-iteration.sh — Inject context-aware error context into compose_prompt() error summary section
config/event-schema.json — Add new event types for context-error system

Implementation Steps

Step 1: Create `scripts/lib/context-error.sh`

This is the core module. Functions:

`format_context_error_report()`

Input: error_message, stage_id, iteration_count, goal, issue_number Output: Formatted multi-section error report (stdout)

Sections:

═══ What Failed ═══
Stage: <stage_id> | Iteration: <count> | Issue: #<num>
Goal: <goal text>
<error category from error-actionability.sh>: <error message>
═══ Why ═══
<relevant log snippet — last 10 lines from stage log, filtered for noise>
═══ Similar Past Issues ═══
<results from memory_ranked_search, formatted as bullet list>
 • <pattern> → Fixed by: <fix> (effectiveness: <rate>%)
═══ Suggested Actions ═══
1. <action based on error category + memory fixes>
2. <action based on error category>
3. <generic action for this stage>

`query_similar_failures()`

Input: error_message, max_results (default 3) Output: JSON array of similar past failures from memory

Wraps memory_ranked_search with the error message as query. Searches both repo-specific and global memory. Returns [] when memory is empty.

`generate_suggested_actions()`

Input: error_category, failure_class, stage_id, similar_failures_json Output: Newline-separated list of 2-3 concrete actions

Logic:

If similar failures exist with effective fixes → suggest those fixes first
Map error categories to default actions:
- FILE_ACCESS → "Check file permissions and paths", "Verify the file exists"
- SYNTAX_ERROR → "Run linter on the changed files", "Check for unclosed brackets/quotes"
- TYPE_ERROR → "Verify function signatures match", "Check for null/undefined values"
- ASSERTION_FAILURE → "Compare expected vs actual test output", "Check if test fixtures are current"
- TIMEOUT → "Increase timeout or check for infinite loops", "Review async/await patterns"
- NETWORK_ERROR → "Check API endpoint availability", "Verify auth tokens are valid"
- MEMORY_ERROR → "Reduce batch size or parallelism", "Check for memory leaks in loops"
Add stage-specific actions:
- build stage → "Review the loop error-summary.json for specific test failures"
- test stage → "Run failing tests individually to isolate the issue"
- review stage → "Check compound quality gate thresholds"
- pr stage → "Verify GitHub token permissions and branch protection rules"

`record_suggestion()`

Input: suggestion_id, error_category, stage_id, suggested_actions (JSON array) Output: Appends to .claude/pipeline-artifacts/suggestions.jsonl

Format: {"id": "<uuid>", "ts": "<iso>", "stage": "<stage>", "category": "<cat>", "actions": [...], "resolved": null}

`mark_suggestion_resolved()`

Input: suggestion_id, resolved (true/false) Output: Updates the suggestion entry; emits suggestion.resolved event

Called when the pipeline succeeds after a failure (in mark_stage_completed or pipeline.completed handler).

`format_github_error_comment()`

Input: error_message, stage_id, iteration_count, goal, issue_number Output: Markdown-formatted error report suitable for GitHub issue comments

Same sections as format_context_error_report() but with markdown formatting, collapsible details for log snippets.

Step 2: Integrate into `mark_stage_failed()` in `pipeline-state.sh`

After the existing failure recording logic (line ~401), add:

# Generate context-aware error report
if type format_context_error_report >/dev/null 2>&1; then
 local context_report
 context_report=$(format_context_error_report \
 "$(tail -5 "$ARTIFACTS_DIR/${stage_id}"*.log 2>/dev/null || echo 'Stage failed')" \
 "$stage_id" \
 "${SELF_HEAL_COUNT:-0}" \
 "${GOAL:-}" \
 "${ISSUE_NUMBER:-}" 2>/dev/null || true)
 if [[ -n "$context_report" ]]; then
 save_artifact "context-error-${stage_id}.md" "$context_report"
 emit_event "error.context_generated" "stage=$stage_id" "issue=${ISSUE_NUMBER:-0}"
 fi
fi

Update the GitHub comment to include the context-aware report using format_github_error_comment.

Step 3: Integrate into `daemon_on_failure()` in `daemon-failure.sh`

In the final failure reporting section (around line 371), enhance the comment body:

local context_report=""
if type format_github_error_comment >/dev/null 2>&1; then
 context_report=$(format_github_error_comment \
 "${log_tail}" "pipeline" "0" \
 "${GOAL:-Issue #${issue_num}}" "$issue_num" 2>/dev/null || true)
fi

Append the context report to the existing GitHub issue comment body.

Step 4: Integrate into `compose_prompt()` in `loop-iteration.sh`

After the existing error-summary.json injection (around line 118), add:

local context_error_section=""
if [[ "$err_count" -gt 0 ]] && type query_similar_failures >/dev/null 2>&1; then
 local similar
 similar=$(query_similar_failures "$err_lines" 3 2>/dev/null || true)
 if [[ -n "$similar" ]] && [[ "$similar" \!= "[]" ]]; then
 local actions
 actions=$(generate_suggested_actions \
 "$(categorize_error "$err_lines" 2>/dev/null || echo "UNKNOWN")" \
 "" "build" "$similar" 2>/dev/null || true)
 context_error_section="## Historical Context for These Errors
Similar failures seen before:
$(echo "$similar" | jq -r '.[] | "- " + .content_text' 2>/dev/null || true)

Suggested actions based on history:
${actions}"
 fi
fi

Step 5: Add event types to `config/event-schema.json`

Add these event types:

error.context_generated — fields: stage, issue, similar_count, suggestion_count
suggestion.recorded — fields: id, stage, category
suggestion.resolved — fields: id, resolved, stage

Step 6: Feedback loop — track suggestion effectiveness

In the pipeline completion handler (where pipeline.completed event is emitted), check for outstanding suggestions:

if [[ -f "$ARTIFACTS_DIR/suggestions.jsonl" ]]; then
 while IFS= read -r suggestion; do
 local sid
 sid=$(echo "$suggestion" | jq -r '.id // empty' 2>/dev/null)
 [[ -n "$sid" ]] && mark_suggestion_resolved "$sid" "true" 2>/dev/null || true
 done < "$ARTIFACTS_DIR/suggestions.jsonl"
fi

Step 7: Write test suite `scripts/sw-lib-context-error-test.sh`

Test cases:

test_format_context_error_report — Verify all 4 sections present in output
test_format_with_empty_memory — Verify graceful handling when no memory exists
test_query_similar_failures — Mock failures.json, verify ranked results returned
test_generate_suggested_actions_from_memory — Verify memory-based suggestions appear first
test_generate_suggested_actions_by_category — Verify category-based defaults for each error type
test_generate_suggested_actions_by_stage — Verify stage-specific actions
test_record_suggestion — Verify JSONL append with correct format
test_mark_suggestion_resolved — Verify update and event emission
test_format_github_error_comment — Verify markdown output
test_no_color_output — Verify NO_COLOR disables formatting

Task Checklist

Task 1: Create scripts/lib/context-error.sh with core functions (format_context_error_report, query_similar_failures, generate_suggested_actions, record_suggestion, mark_suggestion_resolved, format_github_error_comment)
Task 2: Add categorize_error() helper (reuse logic from error-actionability.sh enhance_error_message category detection)
Task 3: Integrate context-aware error report into mark_stage_failed() in pipeline-state.sh
Task 4: Integrate context-aware error report into daemon_on_failure() in daemon-failure.sh
Task 5: Integrate context-aware error context into compose_prompt() in loop-iteration.sh
Task 6: Add new event types to config/event-schema.json
Task 7: Add feedback loop — mark suggestions resolved on pipeline completion in sw-pipeline.sh
Task 8: Write test suite scripts/sw-lib-context-error-test.sh with 10 test cases
Task 9: Register test in package.json test runner
Task 10: Run full test suite and verify no regressions

Testing Approach

Unit Tests (sw-lib-context-error-test.sh)

Use existing test-helpers.sh framework with setup_test_env / cleanup_test_env
Mock memory directory with sample failures.json containing known patterns
Mock emit_event to capture event emissions
Test each function independently with assert_contains, assert_eq, assert_file_exists
Test edge cases: empty memory, missing files, very long error messages

Integration Testing

Run npm test to verify no regressions in existing test suite
Verify the new test script passes: bash scripts/sw-lib-context-error-test.sh

Manual Verification

Trigger a pipeline failure and inspect the GitHub comment for rich context
Verify context-error-<stage>.md artifact is created
Verify suggestions.jsonl is populated
Check events.jsonl for new event types

Definition of Done

Error messages include stage name, iteration count, and original issue/goal
Memory system is queried for similar past failures (via memory_ranked_search)
2-3 concrete suggested actions are generated based on error type + history
Log snippets are included with context (filtered, not raw stack traces)
Error output has clear sections: What Failed / Why / Similar Past Issues / Suggested Actions
Suggestions are recorded in suggestions.jsonl for feedback tracking
Successful pipeline completion marks outstanding suggestions as resolved
All new functions have test coverage
npm test passes with no regressions
No Bash 3.2 compatibility issues (no associative arrays, no readarray)

Alternatives Considered

Approach	Complexity	Performance	Maintainability	Blast Radius
A: New library + hooks (chosen)	Medium	Good (lazy loading)	High (isolated module)	Low (3 files touched)
B: Extend error-actionability.sh	Low	Poor (I/O in scoring path)	Low (SRP violation)	Medium
C: Post-processing pipeline	Low	Good (async)	Medium	Minimal but delayed

Approach A chosen for best balance of isolation, testability, and real-time feedback.

Risk Analysis

Risk	Impact	Likelihood	Mitigation
Memory query slows pipeline	Medium	Low	Already used in loop; add 5s timeout
Breaking existing GitHub comments	High	Low	Additive only — append, don't replace
Empty memory on first run	Low	High	Graceful fallback to category-based suggestions
Bash 3.2 compat issue	Medium	Low	No associative arrays; test on macOS

Endpoint Specification (API Design)

Not applicable — this is a shell library, not an HTTP API. The "API" is the set of shell functions exported by context-error.sh.

Function signatures:

format_context_error_report <error_msg> <stage_id> <iteration> <goal> <issue_number> → stdout
query_similar_failures <error_msg> [max_results] → stdout (JSON array)
generate_suggested_actions <category> <failure_class> <stage_id> <similar_json> → stdout
record_suggestion <id> <category> <stage_id> <actions_json> → appends to suggestions.jsonl
mark_suggestion_resolved <id> <resolved> → updates suggestions.jsonl + emits event
format_github_error_comment <error_msg> <stage_id> <iteration> <goal> <issue_number> → stdout (markdown)

Error Codes: Functions return 0 on success, 1 on error. Stderr for diagnostics. All safe with || true.

Rate Limiting / Versioning: Not applicable (local shell library).

Monitoring Checklist

Not applicable — this is observability infrastructure itself. Self-monitoring via new events (error.context_generated, suggestion.recorded, suggestion.resolved) queryable in events.jsonl.

Root Cause Hypothesis (Systematic Debugging)

First plan attempt — no previous failure. Based on analysis of 5 integration points:

error-actionability.sh — scoring/enhancement, no memory integration
daemon-failure.sh — failure classification, no rich context output
pipeline-state.sh — stage failure reporting, minimal log tail
sw-memory.sh — memory search capability, not connected to error output
loop-iteration.sh — error-summary injection, no historical context

Strategy: Create context-error.sh as the bridge between memory system and error output paths.

Verification: 10-case test suite + full regression suite (npm test).

Pipeline Plan 189

Design Alternatives

Risk Assessment

Dependency Analysis

Files to Modify

New Files

Modified Files

Implementation Steps

Step 1: Create scripts/lib/context-error.sh

format_context_error_report()

query_similar_failures()

generate_suggested_actions()

record_suggestion()

mark_suggestion_resolved()

format_github_error_comment()

Step 2: Integrate into mark_stage_failed() in pipeline-state.sh

Step 3: Integrate into daemon_on_failure() in daemon-failure.sh

Step 4: Integrate into compose_prompt() in loop-iteration.sh

Step 5: Add event types to config/event-schema.json

Step 6: Feedback loop — track suggestion effectiveness

Step 7: Write test suite scripts/sw-lib-context-error-test.sh

Task Checklist

Testing Approach

Unit Tests (sw-lib-context-error-test.sh)

Integration Testing

Manual Verification

Definition of Done

Alternatives Considered

Risk Analysis

Endpoint Specification (API Design)

Monitoring Checklist

Root Cause Hypothesis (Systematic Debugging)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Step 1: Create `scripts/lib/context-error.sh`

`format_context_error_report()`

`query_similar_failures()`

`generate_suggested_actions()`

`record_suggestion()`

`mark_suggestion_resolved()`

`format_github_error_comment()`

Step 2: Integrate into `mark_stage_failed()` in `pipeline-state.sh`

Step 3: Integrate into `daemon_on_failure()` in `daemon-failure.sh`

Step 4: Integrate into `compose_prompt()` in `loop-iteration.sh`

Step 5: Add event types to `config/event-schema.json`

Step 7: Write test suite `scripts/sw-lib-context-error-test.sh`