Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 189

ezigus edited this page Mar 18, 2026 · 2 revisions

Plan written to .claude/pipeline-artifacts/plan.md.

Summary: The plan creates a new scripts/lib/context-error.sh module that bridges the existing memory system (memory_ranked_search) with error output paths. It integrates into 3 failure handlers:

  1. mark_stage_failed() — saves context-error artifact + enriches GitHub comment
  2. daemon_on_failure() — adds context-aware section to failure GitHub comment
  3. compose_prompt() — injects historical failure context into loop iteration prompts

Error reports have 4 sections: What Failed (stage/iteration/goal), Why (filtered log snippets), Similar Past Issues (from memory), Suggested Actions (2-3 concrete steps from history + category defaults).

A feedback loop tracks suggestion effectiveness via suggestions.jsonl, marking them resolved when pipelines succeed.

10 tasks, 2 new files, 4 modified files, 10 test cases, zero Bash 3.2 compat issues. r type and history 4. Include relevant log snippets with context (not just raw stack traces) 5. Format error output with clear sections: What Failed / Why / Similar Past Issues / Suggested Actions 6. Track whether suggested actions lead to successful resolution (feedback loop)

Design Alternatives

Approach A: New library + integration hooks (CHOSEN)

  • Create scripts/lib/context-error.sh with formatting and memory query functions
  • Call from existing failure handlers (mark_stage_failed, daemon_on_failure, loop error path)
  • Store suggestions in suggestions.jsonl for feedback tracking
  • Pros: Minimal blast radius, reusable, testable in isolation
  • Cons: Requires touching 3 existing files

Approach B: Enhance error-actionability.sh directly

  • Extend the existing scoring/enhancement system to include memory queries
  • Pros: Fewer new files
  • Cons: Conflates scoring (fast, side-effect-free) with memory queries (I/O heavy), violates SRP

Approach C: Post-processing pipeline

  • Add a post-failure hook that reads error-log.jsonl and generates reports
  • Pros: Zero changes to existing failure paths
  • Cons: Delayed feedback, can't inject context into retry prompts, misses the real-time output

Choice: Approach A — cleanest separation, minimal blast radius, can be tested independently.

Risk Assessment

  1. Memory query performance: memory_ranked_search does file I/O + jq processing. Mitigation: Already used in loop prompts; add a timeout guard.
  2. Breaking existing error output: GitHub comments use specific formatting. Mitigation: Context-aware output is additional — existing messages unchanged. New content added to existing comment bodies.
  3. Empty memory state: First-time users have no failures.json. Mitigation: Graceful fallback to generic suggestions based on error category.

Dependency Analysis

  • Depends on: helpers.sh (output helpers, emit_event), sw-memory.sh (memory_ranked_search), error-actionability.sh (error categorization)
  • Changed by this PR: pipeline-state.sh (mark_stage_failed), daemon-failure.sh (daemon_on_failure), loop-iteration.sh (error-summary composition)
  • No circular dependency risks: context-error.sh sources helpers.sh and calls sw-memory.sh functions

Files to Modify

New Files

  1. scripts/lib/context-error.sh — Core module: rich error formatting, memory query, suggestion engine, feedback tracking
  2. scripts/sw-lib-context-error-test.sh — Test suite for the new module

Modified Files

  1. scripts/lib/pipeline-state.sh — Integrate context-aware error report into mark_stage_failed() (GitHub comment + artifact)
  2. scripts/lib/daemon-failure.sh — Integrate context-aware error report into daemon_on_failure() (GitHub comment)
  3. scripts/lib/loop-iteration.sh — Inject context-aware error context into compose_prompt() error summary section
  4. config/event-schema.json — Add new event types for context-error system

Implementation Steps

Step 1: Create scripts/lib/context-error.sh

This is the core module. Functions:

format_context_error_report()

Input: error_message, stage_id, iteration_count, goal, issue_number Output: Formatted multi-section error report (stdout)

Sections:

═══ What Failed ═══
Stage: <stage_id> | Iteration: <count> | Issue: #<num>
Goal: <goal text>
<error category from error-actionability.sh>: <error message>
═══ Why ═══
<relevant log snippet — last 10 lines from stage log, filtered for noise>
═══ Similar Past Issues ═══
<results from memory_ranked_search, formatted as bullet list>
 • <pattern> → Fixed by: <fix> (effectiveness: <rate>%)
═══ Suggested Actions ═══
1. <action based on error category + memory fixes>
2. <action based on error category>
3. <generic action for this stage>

query_similar_failures()

Input: error_message, max_results (default 3) Output: JSON array of similar past failures from memory

Wraps memory_ranked_search with the error message as query. Searches both repo-specific and global memory. Returns [] when memory is empty.

generate_suggested_actions()

Input: error_category, failure_class, stage_id, similar_failures_json Output: Newline-separated list of 2-3 concrete actions

Logic:

  • If similar failures exist with effective fixes → suggest those fixes first
  • Map error categories to default actions:
    • FILE_ACCESS → "Check file permissions and paths", "Verify the file exists"
    • SYNTAX_ERROR → "Run linter on the changed files", "Check for unclosed brackets/quotes"
    • TYPE_ERROR → "Verify function signatures match", "Check for null/undefined values"
    • ASSERTION_FAILURE → "Compare expected vs actual test output", "Check if test fixtures are current"
    • TIMEOUT → "Increase timeout or check for infinite loops", "Review async/await patterns"
    • NETWORK_ERROR → "Check API endpoint availability", "Verify auth tokens are valid"
    • MEMORY_ERROR → "Reduce batch size or parallelism", "Check for memory leaks in loops"
  • Add stage-specific actions:
    • build stage → "Review the loop error-summary.json for specific test failures"
    • test stage → "Run failing tests individually to isolate the issue"
    • review stage → "Check compound quality gate thresholds"
    • pr stage → "Verify GitHub token permissions and branch protection rules"

record_suggestion()

Input: suggestion_id, error_category, stage_id, suggested_actions (JSON array) Output: Appends to .claude/pipeline-artifacts/suggestions.jsonl

Format: {"id": "<uuid>", "ts": "<iso>", "stage": "<stage>", "category": "<cat>", "actions": [...], "resolved": null}

mark_suggestion_resolved()

Input: suggestion_id, resolved (true/false) Output: Updates the suggestion entry; emits suggestion.resolved event

Called when the pipeline succeeds after a failure (in mark_stage_completed or pipeline.completed handler).

format_github_error_comment()

Input: error_message, stage_id, iteration_count, goal, issue_number Output: Markdown-formatted error report suitable for GitHub issue comments

Same sections as format_context_error_report() but with markdown formatting, collapsible details for log snippets.

Step 2: Integrate into mark_stage_failed() in pipeline-state.sh

After the existing failure recording logic (line ~401), add:

# Generate context-aware error report
if type format_context_error_report >/dev/null 2>&1; then
 local context_report
 context_report=$(format_context_error_report \
 "$(tail -5 "$ARTIFACTS_DIR/${stage_id}"*.log 2>/dev/null || echo 'Stage failed')" \
 "$stage_id" \
 "${SELF_HEAL_COUNT:-0}" \
 "${GOAL:-}" \
 "${ISSUE_NUMBER:-}" 2>/dev/null || true)
 if [[ -n "$context_report" ]]; then
 save_artifact "context-error-${stage_id}.md" "$context_report"
 emit_event "error.context_generated" "stage=$stage_id" "issue=${ISSUE_NUMBER:-0}"
 fi
fi

Update the GitHub comment to include the context-aware report using format_github_error_comment.

Step 3: Integrate into daemon_on_failure() in daemon-failure.sh

In the final failure reporting section (around line 371), enhance the comment body:

local context_report=""
if type format_github_error_comment >/dev/null 2>&1; then
 context_report=$(format_github_error_comment \
 "${log_tail}" "pipeline" "0" \
 "${GOAL:-Issue #${issue_num}}" "$issue_num" 2>/dev/null || true)
fi

Append the context report to the existing GitHub issue comment body.

Step 4: Integrate into compose_prompt() in loop-iteration.sh

After the existing error-summary.json injection (around line 118), add:

local context_error_section=""
if [[ "$err_count" -gt 0 ]] && type query_similar_failures >/dev/null 2>&1; then
 local similar
 similar=$(query_similar_failures "$err_lines" 3 2>/dev/null || true)
 if [[ -n "$similar" ]] && [[ "$similar" \!= "[]" ]]; then
 local actions
 actions=$(generate_suggested_actions \
 "$(categorize_error "$err_lines" 2>/dev/null || echo "UNKNOWN")" \
 "" "build" "$similar" 2>/dev/null || true)
 context_error_section="## Historical Context for These Errors
Similar failures seen before:
$(echo "$similar" | jq -r '.[] | "- " + .content_text' 2>/dev/null || true)

Suggested actions based on history:
${actions}"
 fi
fi

Step 5: Add event types to config/event-schema.json

Add these event types:

  • error.context_generated — fields: stage, issue, similar_count, suggestion_count
  • suggestion.recorded — fields: id, stage, category
  • suggestion.resolved — fields: id, resolved, stage

Step 6: Feedback loop — track suggestion effectiveness

In the pipeline completion handler (where pipeline.completed event is emitted), check for outstanding suggestions:

if [[ -f "$ARTIFACTS_DIR/suggestions.jsonl" ]]; then
 while IFS= read -r suggestion; do
 local sid
 sid=$(echo "$suggestion" | jq -r '.id // empty' 2>/dev/null)
 [[ -n "$sid" ]] && mark_suggestion_resolved "$sid" "true" 2>/dev/null || true
 done < "$ARTIFACTS_DIR/suggestions.jsonl"
fi

Step 7: Write test suite scripts/sw-lib-context-error-test.sh

Test cases:

  1. test_format_context_error_report — Verify all 4 sections present in output
  2. test_format_with_empty_memory — Verify graceful handling when no memory exists
  3. test_query_similar_failures — Mock failures.json, verify ranked results returned
  4. test_generate_suggested_actions_from_memory — Verify memory-based suggestions appear first
  5. test_generate_suggested_actions_by_category — Verify category-based defaults for each error type
  6. test_generate_suggested_actions_by_stage — Verify stage-specific actions
  7. test_record_suggestion — Verify JSONL append with correct format
  8. test_mark_suggestion_resolved — Verify update and event emission
  9. test_format_github_error_comment — Verify markdown output
  10. test_no_color_output — Verify NO_COLOR disables formatting

Task Checklist

  • Task 1: Create scripts/lib/context-error.sh with core functions (format_context_error_report, query_similar_failures, generate_suggested_actions, record_suggestion, mark_suggestion_resolved, format_github_error_comment)
  • Task 2: Add categorize_error() helper (reuse logic from error-actionability.sh enhance_error_message category detection)
  • Task 3: Integrate context-aware error report into mark_stage_failed() in pipeline-state.sh
  • Task 4: Integrate context-aware error report into daemon_on_failure() in daemon-failure.sh
  • Task 5: Integrate context-aware error context into compose_prompt() in loop-iteration.sh
  • Task 6: Add new event types to config/event-schema.json
  • Task 7: Add feedback loop — mark suggestions resolved on pipeline completion in sw-pipeline.sh
  • Task 8: Write test suite scripts/sw-lib-context-error-test.sh with 10 test cases
  • Task 9: Register test in package.json test runner
  • Task 10: Run full test suite and verify no regressions

Testing Approach

Unit Tests (sw-lib-context-error-test.sh)

  • Use existing test-helpers.sh framework with setup_test_env / cleanup_test_env
  • Mock memory directory with sample failures.json containing known patterns
  • Mock emit_event to capture event emissions
  • Test each function independently with assert_contains, assert_eq, assert_file_exists
  • Test edge cases: empty memory, missing files, very long error messages

Integration Testing

  • Run npm test to verify no regressions in existing test suite
  • Verify the new test script passes: bash scripts/sw-lib-context-error-test.sh

Manual Verification

  • Trigger a pipeline failure and inspect the GitHub comment for rich context
  • Verify context-error-<stage>.md artifact is created
  • Verify suggestions.jsonl is populated
  • Check events.jsonl for new event types

Definition of Done

  • Error messages include stage name, iteration count, and original issue/goal
  • Memory system is queried for similar past failures (via memory_ranked_search)
  • 2-3 concrete suggested actions are generated based on error type + history
  • Log snippets are included with context (filtered, not raw stack traces)
  • Error output has clear sections: What Failed / Why / Similar Past Issues / Suggested Actions
  • Suggestions are recorded in suggestions.jsonl for feedback tracking
  • Successful pipeline completion marks outstanding suggestions as resolved
  • All new functions have test coverage
  • npm test passes with no regressions
  • No Bash 3.2 compatibility issues (no associative arrays, no readarray)

Alternatives Considered

Approach Complexity Performance Maintainability Blast Radius
A: New library + hooks (chosen) Medium Good (lazy loading) High (isolated module) Low (3 files touched)
B: Extend error-actionability.sh Low Poor (I/O in scoring path) Low (SRP violation) Medium
C: Post-processing pipeline Low Good (async) Medium Minimal but delayed

Approach A chosen for best balance of isolation, testability, and real-time feedback.


Risk Analysis

Risk Impact Likelihood Mitigation
Memory query slows pipeline Medium Low Already used in loop; add 5s timeout
Breaking existing GitHub comments High Low Additive only — append, don't replace
Empty memory on first run Low High Graceful fallback to category-based suggestions
Bash 3.2 compat issue Medium Low No associative arrays; test on macOS

Endpoint Specification (API Design)

Not applicable — this is a shell library, not an HTTP API. The "API" is the set of shell functions exported by context-error.sh.

Function signatures:

  • format_context_error_report <error_msg> <stage_id> <iteration> <goal> <issue_number> → stdout
  • query_similar_failures <error_msg> [max_results] → stdout (JSON array)
  • generate_suggested_actions <category> <failure_class> <stage_id> <similar_json> → stdout
  • record_suggestion <id> <category> <stage_id> <actions_json> → appends to suggestions.jsonl
  • mark_suggestion_resolved <id> <resolved> → updates suggestions.jsonl + emits event
  • format_github_error_comment <error_msg> <stage_id> <iteration> <goal> <issue_number> → stdout (markdown)

Error Codes: Functions return 0 on success, 1 on error. Stderr for diagnostics. All safe with || true.

Rate Limiting / Versioning: Not applicable (local shell library).


Monitoring Checklist

Not applicable — this is observability infrastructure itself. Self-monitoring via new events (error.context_generated, suggestion.recorded, suggestion.resolved) queryable in events.jsonl.


Root Cause Hypothesis (Systematic Debugging)

First plan attempt — no previous failure. Based on analysis of 5 integration points:

  1. error-actionability.sh — scoring/enhancement, no memory integration
  2. daemon-failure.sh — failure classification, no rich context output
  3. pipeline-state.sh — stage failure reporting, minimal log tail
  4. sw-memory.sh — memory search capability, not connected to error output
  5. loop-iteration.sh — error-summary injection, no historical context

Strategy: Create context-error.sh as the bridge between memory system and error output paths.

Verification: 10-case test suite + full regression suite (npm test).

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /