Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 189

ezigus edited this page Mar 18, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Key architectural decisions documented:

  • Standalone library (context-error.sh) with 8 public functions — chosen over global error() override (blast radius too high) and stderr post-processing (can't inject into loop prompts mid-iteration)
  • 4 integration points via type guards — pipeline-state, daemon-failure, loop-iteration, sw-pipeline completion — all degrade gracefully when the library isn't loaded
  • 5-second timeout on memory queries prevents IO-bound lookups from stalling error output
  • Unidirectional dependency graph — callers → context-error → memory + helpers, no cycles
  • Atomic JSONL writes for suggestion tracking with tmp+mv pattern
  • All 15 validation criteria checked off against the existing implementation in commit fded001 prompts (loop-iteration), and pipeline completion (feedback loop)
  • All jq usage must tolerate jq being absent
  • Atomic file writes required (pipefail + concurrent workers)

Decision

Standalone library module (scripts/lib/context-error.sh) with 8 pure-ish functions, integrated into 4 existing modules via explicit type func_name >/dev/null 2>&1 guard checks. No global side effects — callers opt in by sourcing the library and calling functions directly.

Component Diagram

┌──────────────────────────────────────────────────────────────┐
│ Callers (Integration Points) │
│ │
│ pipeline-state.sh daemon-failure.sh loop-iteration.sh │
│ mark_stage_failed() daemon_on_failure() compose_prompt() │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ format_context_ format_github_ query_similar_ │
│ error_report() error_comment() failures() + │
│ │ │ categorize_error() + │
│ │ │ generate_actions() │
│ └────────┬───────────┘ │ │
│ ▼ │ │
│ ┌─────────────────────────────────┐ │ │
│ │ context-error.sh │◄─────────────┘ │
│ │ │ │
│ │ categorize_error() │ │
│ │ query_similar_failures() ─────┼──► sw-memory.sh │
│ │ generate_suggested_actions() │ (memory_ranked_search)│
│ │ format_context_error_report() │ │
│ │ format_github_error_comment() │ │
│ │ record_suggestion() ─────┼──► suggestions.jsonl │
│ │ mark_suggestion_resolved() │ │
│ │ resolve_outstanding_suggestions│ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ sw-pipeline.sh (completion) ── resolve_outstanding_ │
│ suggestions() │
│ │
│ helpers.sh ── emit_event() ──► events.jsonl │
└──────────────────────────────────────────────────────────────┘

Dependencies flow one direction: callers → context-error → memory + helpers. No circular dependencies.

Interface Contracts

// Error categorization — maps error text to one of 10 known categories
categorize_error(error_msg: string): ErrorCategory
// Returns: "FILE_ACCESS" | "FUNCTION_ERROR" | "SYNTAX_ERROR" | "ASSERTION_FAILURE"
// | "TYPE_ERROR" | "TIMEOUT" | "MEMORY_ERROR" | "NETWORK_ERROR"
// | "RESOURCE_ERROR" | "UNKNOWN"
// Errors: never fails (falls through to "UNKNOWN")
// Memory query with timeout guard
query_similar_failures(error_msg: string, max_results?: number = 3): JSONArray
// Returns: JSON array of similar past failures, or "[]"
// Errors: returns "[]" if memory_ranked_search unavailable, dir missing, or timeout
// Timeout: 5 seconds hard limit via `timeout` command
// Action generation — combines memory fixes + category defaults + stage context
generate_suggested_actions(
 category: ErrorCategory,
 failure_class: string, // unused, reserved for future error-actionability bridge
 stage_id: string,
 similar_json: JSONArray
): string // newline-separated action list (2-4 items)
// Errors: never fails (always produces at least 2 default actions)
// Terminal-formatted 4-section report
format_context_error_report(
 error_msg: string,
 stage_id: string,
 iteration: number,
 goal: string,
 issue_number: string
): string // multi-line formatted report
// Errors: returns partial report on internal failures (each section independent)
// Respects NO_COLOR env var for plain-text output
// GitHub markdown-formatted report
format_github_error_comment(
 error_msg: string,
 stage_id: string,
 iteration: number,
 goal: string,
 issue_number: string
): string // markdown with collapsible <details> for error logs
// Errors: same graceful degradation as terminal format
// Suggestion tracking — append to JSONL
record_suggestion(
 suggestion_id: string,
 category: ErrorCategory,
 stage_id: string,
 actions_json: JSONArray
): void
// Side effects: appends to $ARTIFACTS_DIR/suggestions.jsonl (atomic via tmp+mv)
// emits suggestion.recorded event
// Individual resolution
mark_suggestion_resolved(suggestion_id: string, resolved?: string = "true"): void
// Side effects: atomically rewrites suggestions.jsonl, emits suggestion.resolved event
// Errors: no-op if file missing or ID not found
// Batch resolution on pipeline success
resolve_outstanding_suggestions(): void
// Side effects: marks all unresolved suggestions as resolved (atomic rewrite)
// Errors: no-op if file missing

Data Flow

Error path (pipeline stage fails):

Stage fails
 → pipeline-state.sh:mark_stage_failed()
 → reads last 5 lines from stage log
 → format_context_error_report(log_tail, stage, iteration, goal, issue)
 → categorize_error(log_tail) → category
 → query_similar_failures(log_tail, 3) → [similar matches] (5s timeout)
 → generate_suggested_actions(category, "", stage, similar) → actions
 → assemble 4-section report
 → save_artifact("context-error-{stage}.md", report)
 → record_suggestion(id, category, stage, actions_json)
 → emit_event("error.context_generated")

Daemon failure path (GitHub comment):

Daemon worker fails
 → daemon-failure.sh:daemon_on_failure()
 → format_github_error_comment(log_tail, "pipeline", 0, goal, issue)
 → same internal flow as above, markdown output
 → appended to GitHub issue comment body

Loop iteration path (prompt injection):

Build loop iteration with prior errors
 → loop-iteration.sh:compose_prompt()
 → query_similar_failures(error_lines, 3)
 → categorize_error(error_lines)
 → generate_suggested_actions(category, "", "build", similar)
 → inject "Historical Context" section into Claude prompt

Feedback loop (pipeline success):

Pipeline completes successfully
 → sw-pipeline.sh (completion block)
 → resolve_outstanding_suggestions()
 → rewrites suggestions.jsonl, setting resolved="true" on all null entries

Error Boundaries

Component Error Source Handling
query_similar_failures memory_ranked_search not loaded type check → return []
query_similar_failures Memory dir doesn't exist Dir check → return []
query_similar_failures Query takes too long timeout 5 → return []
generate_suggested_actions jq parse failure on similar_json 2>/dev/null fallback → skip memory-based fix, use category defaults
record_suggestion jq unavailable 2>/dev/null on jq call; file may not be written
record_suggestion ARTIFACTS_DIR not writable mkdir -p with fallback; silent failure
mark_suggestion_resolved Concurrent writers Atomic tmp + mv; last writer wins (acceptable for tracking data)
All callers context-error.sh not sourced type func_name >/dev/null 2>&1 guard before every call
All callers Any function throws 2>/dev/null with fallback wrapping at call sites

Every integration point is guarded with type ... >/dev/null 2>&1 so the system operates identically when context-error.sh is not loaded. No caller can fail due to this module being absent or broken.

Alternatives Considered

  1. Override global error() function — Pros: zero integration work, every error path automatically enriched. Cons: massive blast radius (all error calls would trigger memory queries), 5-second timeout penalty on every error, untestable (global state mutation), would break simple error output in non-pipeline contexts. Rejected because the cost model is inverted: most errors don't need memory context, but all would pay the latency price.

  2. Post-process stderr after stage completion — Pros: no source code changes to existing modules. Cons: requires complex stderr capture (tee + temp files), delays error feedback until after the stage completes (defeats the purpose of real-time context), loses the ability to inject context into loop prompts mid-iteration. Rejected because the loop-iteration use case requires error context during execution, not after.

Implementation Plan

Files created

  • scripts/lib/context-error.sh — 431-line core library (8 public functions, include guard, version-stamped)
  • scripts/sw-lib-context-error-test.sh — 52-case test suite covering all functions, edge cases, and event emission

Files modified

  • scripts/lib/pipeline-state.shmark_stage_failed() calls format_context_error_report + record_suggestion (lines ~440-464)
  • scripts/lib/daemon-failure.shdaemon_on_failure() calls format_github_error_comment (lines ~371-376)
  • scripts/lib/loop-iteration.shcompose_prompt() injects historical error context via query_similar_failures + categorize_error + generate_suggested_actions (lines ~121-132)
  • scripts/sw-pipeline.sh — Pipeline completion block calls resolve_outstanding_suggestions (line ~2746)
  • config/event-schema.json — Registers 3 new events: error.context_generated, suggestion.recorded, suggestion.resolved

Dependencies

  • No new external dependencies
  • Runtime dependency on jq (with fallbacks for absence)
  • Optional dependency on memory_ranked_search from sw-memory.sh (graceful degradation)

Risk areas

  • Memory query latency: Mitigated by 5-second timeout command wrapper. If the memory corpus is very large, queries could approach this limit. Monitor via suggestion.recorded event timestamps.
  • suggestions.jsonl concurrent access: mark_suggestion_resolved and resolve_outstanding_suggestions both do full-file rewrite via tmp+mv. In theory, two concurrent pipelines could race. Acceptable for tracking/analytics data — not a correctness concern.
  • Error category accuracy: Regex-based categorization (categorize_error) is necessarily heuristic. The ordered regex cascade means a message matching both "timeout" and "assertion" patterns would be classified by whichever regex appears first. This is acceptable because the category drives suggestions, not control flow.

Validation Criteria

  • scripts/lib/context-error.sh loads without error and exports 8 public functions
  • categorize_error correctly classifies all 10 error categories via regex patterns
  • query_similar_failures returns [] when memory system is unavailable (no crash, no hang)
  • query_similar_failures respects 5-second timeout (tested via mock)
  • format_context_error_report produces all 4 sections: What Failed, Why, Similar Past Issues, Suggested Actions
  • format_context_error_report includes stage name, iteration count, issue number, and goal in output
  • format_github_error_comment produces valid markdown with <details> blocks
  • record_suggestion appends valid JSONL and emits suggestion.recorded event
  • mark_suggestion_resolved atomically updates the correct entry and emits suggestion.resolved event
  • resolve_outstanding_suggestions batch-resolves unresolved entries without touching already-resolved ones
  • All 4 integration points use type ... >/dev/null 2>&1 guards (zero impact when library not loaded)
  • All 52 test cases pass in scripts/sw-lib-context-error-test.sh
  • Full test suite (npm test) passes with no regressions
  • NO_COLOR environment variable produces plain-text output (no Unicode box-drawing)
  • New events registered in config/event-schema.json with correct required/optional fields

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /