-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 189
Plan written to .claude/pipeline-artifacts/plan.md.
Summary: The plan creates a new scripts/lib/context-error.sh module that bridges the existing memory system (memory_ranked_search) with error output paths. It integrates into 3 failure handlers:
-
mark_stage_failed()— saves context-error artifact + enriches GitHub comment -
daemon_on_failure()— adds context-aware section to failure GitHub comment -
compose_prompt()— injects historical failure context into loop iteration prompts
Error reports have 4 sections: What Failed (stage/iteration/goal), Why (filtered log snippets), Similar Past Issues (from memory), Suggested Actions (2-3 concrete steps from history + category defaults).
A feedback loop tracks suggestion effectiveness via suggestions.jsonl, marking them resolved when pipelines succeed.
10 tasks, 2 new files, 4 modified files, 10 test cases, zero Bash 3.2 compat issues. r type and history 4. Include relevant log snippets with context (not just raw stack traces) 5. Format error output with clear sections: What Failed / Why / Similar Past Issues / Suggested Actions 6. Track whether suggested actions lead to successful resolution (feedback loop)
Approach A: New library + integration hooks (CHOSEN)
- Create
scripts/lib/context-error.shwith formatting and memory query functions - Call from existing failure handlers (
mark_stage_failed,daemon_on_failure, loop error path) - Store suggestions in
suggestions.jsonlfor feedback tracking - Pros: Minimal blast radius, reusable, testable in isolation
- Cons: Requires touching 3 existing files
Approach B: Enhance error-actionability.sh directly
- Extend the existing scoring/enhancement system to include memory queries
- Pros: Fewer new files
- Cons: Conflates scoring (fast, side-effect-free) with memory queries (I/O heavy), violates SRP
Approach C: Post-processing pipeline
- Add a post-failure hook that reads error-log.jsonl and generates reports
- Pros: Zero changes to existing failure paths
- Cons: Delayed feedback, can't inject context into retry prompts, misses the real-time output
Choice: Approach A — cleanest separation, minimal blast radius, can be tested independently.
-
Memory query performance:
memory_ranked_searchdoes file I/O + jq processing. Mitigation: Already used in loop prompts; add a timeout guard. - Breaking existing error output: GitHub comments use specific formatting. Mitigation: Context-aware output is additional — existing messages unchanged. New content added to existing comment bodies.
- Empty memory state: First-time users have no failures.json. Mitigation: Graceful fallback to generic suggestions based on error category.
-
Depends on:
helpers.sh(output helpers, emit_event),sw-memory.sh(memory_ranked_search),error-actionability.sh(error categorization) -
Changed by this PR:
pipeline-state.sh(mark_stage_failed),daemon-failure.sh(daemon_on_failure),loop-iteration.sh(error-summary composition) - No circular dependency risks: context-error.sh sources helpers.sh and calls sw-memory.sh functions
-
scripts/lib/context-error.sh— Core module: rich error formatting, memory query, suggestion engine, feedback tracking -
scripts/sw-lib-context-error-test.sh— Test suite for the new module
-
scripts/lib/pipeline-state.sh— Integrate context-aware error report intomark_stage_failed()(GitHub comment + artifact) -
scripts/lib/daemon-failure.sh— Integrate context-aware error report intodaemon_on_failure()(GitHub comment) -
scripts/lib/loop-iteration.sh— Inject context-aware error context intocompose_prompt()error summary section -
config/event-schema.json— Add new event types for context-error system
This is the core module. Functions:
Input: error_message, stage_id, iteration_count, goal, issue_number Output: Formatted multi-section error report (stdout)
Sections:
═══ What Failed ═══
Stage: <stage_id> | Iteration: <count> | Issue: #<num>
Goal: <goal text>
<error category from error-actionability.sh>: <error message>
═══ Why ═══
<relevant log snippet — last 10 lines from stage log, filtered for noise>
═══ Similar Past Issues ═══
<results from memory_ranked_search, formatted as bullet list>
• <pattern> → Fixed by: <fix> (effectiveness: <rate>%)
═══ Suggested Actions ═══
1. <action based on error category + memory fixes>
2. <action based on error category>
3. <generic action for this stage>
Input: error_message, max_results (default 3) Output: JSON array of similar past failures from memory
Wraps memory_ranked_search with the error message as query. Searches both repo-specific and global memory. Returns [] when memory is empty.
Input: error_category, failure_class, stage_id, similar_failures_json Output: Newline-separated list of 2-3 concrete actions
Logic:
- If similar failures exist with effective fixes → suggest those fixes first
- Map error categories to default actions:
-
FILE_ACCESS→ "Check file permissions and paths", "Verify the file exists" -
SYNTAX_ERROR→ "Run linter on the changed files", "Check for unclosed brackets/quotes" -
TYPE_ERROR→ "Verify function signatures match", "Check for null/undefined values" -
ASSERTION_FAILURE→ "Compare expected vs actual test output", "Check if test fixtures are current" -
TIMEOUT→ "Increase timeout or check for infinite loops", "Review async/await patterns" -
NETWORK_ERROR→ "Check API endpoint availability", "Verify auth tokens are valid" -
MEMORY_ERROR→ "Reduce batch size or parallelism", "Check for memory leaks in loops"
-
- Add stage-specific actions:
-
buildstage → "Review the loop error-summary.json for specific test failures" -
teststage → "Run failing tests individually to isolate the issue" -
reviewstage → "Check compound quality gate thresholds" -
prstage → "Verify GitHub token permissions and branch protection rules"
-
Input: suggestion_id, error_category, stage_id, suggested_actions (JSON array)
Output: Appends to .claude/pipeline-artifacts/suggestions.jsonl
Format: {"id": "<uuid>", "ts": "<iso>", "stage": "<stage>", "category": "<cat>", "actions": [...], "resolved": null}
Input: suggestion_id, resolved (true/false)
Output: Updates the suggestion entry; emits suggestion.resolved event
Called when the pipeline succeeds after a failure (in mark_stage_completed or pipeline.completed handler).
Input: error_message, stage_id, iteration_count, goal, issue_number Output: Markdown-formatted error report suitable for GitHub issue comments
Same sections as format_context_error_report() but with markdown formatting, collapsible details for log snippets.
After the existing failure recording logic (line ~401), add:
# Generate context-aware error report if type format_context_error_report >/dev/null 2>&1; then local context_report context_report=$(format_context_error_report \ "$(tail -5 "$ARTIFACTS_DIR/${stage_id}"*.log 2>/dev/null || echo 'Stage failed')" \ "$stage_id" \ "${SELF_HEAL_COUNT:-0}" \ "${GOAL:-}" \ "${ISSUE_NUMBER:-}" 2>/dev/null || true) if [[ -n "$context_report" ]]; then save_artifact "context-error-${stage_id}.md" "$context_report" emit_event "error.context_generated" "stage=$stage_id" "issue=${ISSUE_NUMBER:-0}" fi fi
Update the GitHub comment to include the context-aware report using format_github_error_comment.
In the final failure reporting section (around line 371), enhance the comment body:
local context_report="" if type format_github_error_comment >/dev/null 2>&1; then context_report=$(format_github_error_comment \ "${log_tail}" "pipeline" "0" \ "${GOAL:-Issue #${issue_num}}" "$issue_num" 2>/dev/null || true) fi
Append the context report to the existing GitHub issue comment body.
After the existing error-summary.json injection (around line 118), add:
local context_error_section="" if [[ "$err_count" -gt 0 ]] && type query_similar_failures >/dev/null 2>&1; then local similar similar=$(query_similar_failures "$err_lines" 3 2>/dev/null || true) if [[ -n "$similar" ]] && [[ "$similar" \!= "[]" ]]; then local actions actions=$(generate_suggested_actions \ "$(categorize_error "$err_lines" 2>/dev/null || echo "UNKNOWN")" \ "" "build" "$similar" 2>/dev/null || true) context_error_section="## Historical Context for These Errors Similar failures seen before: $(echo "$similar" | jq -r '.[] | "- " + .content_text' 2>/dev/null || true) Suggested actions based on history: ${actions}" fi fi
Add these event types:
-
error.context_generated— fields:stage,issue,similar_count,suggestion_count -
suggestion.recorded— fields:id,stage,category -
suggestion.resolved— fields:id,resolved,stage
In the pipeline completion handler (where pipeline.completed event is emitted), check for outstanding suggestions:
if [[ -f "$ARTIFACTS_DIR/suggestions.jsonl" ]]; then while IFS= read -r suggestion; do local sid sid=$(echo "$suggestion" | jq -r '.id // empty' 2>/dev/null) [[ -n "$sid" ]] && mark_suggestion_resolved "$sid" "true" 2>/dev/null || true done < "$ARTIFACTS_DIR/suggestions.jsonl" fi
Test cases:
-
test_format_context_error_report— Verify all 4 sections present in output -
test_format_with_empty_memory— Verify graceful handling when no memory exists -
test_query_similar_failures— Mock failures.json, verify ranked results returned -
test_generate_suggested_actions_from_memory— Verify memory-based suggestions appear first -
test_generate_suggested_actions_by_category— Verify category-based defaults for each error type -
test_generate_suggested_actions_by_stage— Verify stage-specific actions -
test_record_suggestion— Verify JSONL append with correct format -
test_mark_suggestion_resolved— Verify update and event emission -
test_format_github_error_comment— Verify markdown output -
test_no_color_output— Verify NO_COLOR disables formatting
- Task 1: Create
scripts/lib/context-error.shwith core functions (format_context_error_report,query_similar_failures,generate_suggested_actions,record_suggestion,mark_suggestion_resolved,format_github_error_comment) - Task 2: Add
categorize_error()helper (reuse logic fromerror-actionability.shenhance_error_message category detection) - Task 3: Integrate context-aware error report into
mark_stage_failed()inpipeline-state.sh - Task 4: Integrate context-aware error report into
daemon_on_failure()indaemon-failure.sh - Task 5: Integrate context-aware error context into
compose_prompt()inloop-iteration.sh - Task 6: Add new event types to
config/event-schema.json - Task 7: Add feedback loop — mark suggestions resolved on pipeline completion in
sw-pipeline.sh - Task 8: Write test suite
scripts/sw-lib-context-error-test.shwith 10 test cases - Task 9: Register test in
package.jsontest runner - Task 10: Run full test suite and verify no regressions
- Use existing
test-helpers.shframework withsetup_test_env/cleanup_test_env - Mock memory directory with sample
failures.jsoncontaining known patterns - Mock
emit_eventto capture event emissions - Test each function independently with
assert_contains,assert_eq,assert_file_exists - Test edge cases: empty memory, missing files, very long error messages
- Run
npm testto verify no regressions in existing test suite - Verify the new test script passes:
bash scripts/sw-lib-context-error-test.sh
- Trigger a pipeline failure and inspect the GitHub comment for rich context
- Verify
context-error-<stage>.mdartifact is created - Verify
suggestions.jsonlis populated - Check events.jsonl for new event types
- Error messages include stage name, iteration count, and original issue/goal
- Memory system is queried for similar past failures (via
memory_ranked_search) - 2-3 concrete suggested actions are generated based on error type + history
- Log snippets are included with context (filtered, not raw stack traces)
- Error output has clear sections: What Failed / Why / Similar Past Issues / Suggested Actions
- Suggestions are recorded in
suggestions.jsonlfor feedback tracking - Successful pipeline completion marks outstanding suggestions as resolved
- All new functions have test coverage
-
npm testpasses with no regressions - No Bash 3.2 compatibility issues (no associative arrays, no readarray)
| Approach | Complexity | Performance | Maintainability | Blast Radius |
|---|---|---|---|---|
| A: New library + hooks (chosen) | Medium | Good (lazy loading) | High (isolated module) | Low (3 files touched) |
| B: Extend error-actionability.sh | Low | Poor (I/O in scoring path) | Low (SRP violation) | Medium |
| C: Post-processing pipeline | Low | Good (async) | Medium | Minimal but delayed |
Approach A chosen for best balance of isolation, testability, and real-time feedback.
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Memory query slows pipeline | Medium | Low | Already used in loop; add 5s timeout |
| Breaking existing GitHub comments | High | Low | Additive only — append, don't replace |
| Empty memory on first run | Low | High | Graceful fallback to category-based suggestions |
| Bash 3.2 compat issue | Medium | Low | No associative arrays; test on macOS |
Not applicable — this is a shell library, not an HTTP API. The "API" is the set of shell functions exported by context-error.sh.
Function signatures:
-
format_context_error_report <error_msg> <stage_id> <iteration> <goal> <issue_number>→ stdout -
query_similar_failures <error_msg> [max_results]→ stdout (JSON array) -
generate_suggested_actions <category> <failure_class> <stage_id> <similar_json>→ stdout -
record_suggestion <id> <category> <stage_id> <actions_json>→ appends to suggestions.jsonl -
mark_suggestion_resolved <id> <resolved>→ updates suggestions.jsonl + emits event -
format_github_error_comment <error_msg> <stage_id> <iteration> <goal> <issue_number>→ stdout (markdown)
Error Codes: Functions return 0 on success, 1 on error. Stderr for diagnostics. All safe with || true.
Rate Limiting / Versioning: Not applicable (local shell library).
Not applicable — this is observability infrastructure itself. Self-monitoring via new events (error.context_generated, suggestion.recorded, suggestion.resolved) queryable in events.jsonl.
First plan attempt — no previous failure. Based on analysis of 5 integration points:
-
error-actionability.sh— scoring/enhancement, no memory integration -
daemon-failure.sh— failure classification, no rich context output -
pipeline-state.sh— stage failure reporting, minimal log tail -
sw-memory.sh— memory search capability, not connected to error output -
loop-iteration.sh— error-summary injection, no historical context
Strategy: Create context-error.sh as the bridge between memory system and error output paths.
Verification: 10-case test suite + full regression suite (npm test).