Pipeline Design 189

ezigus edited this page Mar 18, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Key architectural decisions documented:

Standalone library (context-error.sh) with 8 public functions — chosen over global error() override (blast radius too high) and stderr post-processing (can't inject into loop prompts mid-iteration)
4 integration points via type guards — pipeline-state, daemon-failure, loop-iteration, sw-pipeline completion — all degrade gracefully when the library isn't loaded
5-second timeout on memory queries prevents IO-bound lookups from stalling error output
Unidirectional dependency graph — callers → context-error → memory + helpers, no cycles
Atomic JSONL writes for suggestion tracking with tmp+mv pattern
All 15 validation criteria checked off against the existing implementation in commit fded001 prompts (loop-iteration), and pipeline completion (feedback loop)
All jq usage must tolerate jq being absent
Atomic file writes required (pipefail + concurrent workers)

Decision

Standalone library module (scripts/lib/context-error.sh) with 8 pure-ish functions, integrated into 4 existing modules via explicit type func_name >/dev/null 2>&1 guard checks. No global side effects — callers opt in by sourcing the library and calling functions directly.

Component Diagram

┌──────────────────────────────────────────────────────────────┐
│ Callers (Integration Points) │
│ │
│ pipeline-state.sh daemon-failure.sh loop-iteration.sh │
│ mark_stage_failed() daemon_on_failure() compose_prompt() │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ format_context_ format_github_ query_similar_ │
│ error_report() error_comment() failures() + │
│ │ │ categorize_error() + │
│ │ │ generate_actions() │
│ └────────┬───────────┘ │ │
│ ▼ │ │
│ ┌─────────────────────────────────┐ │ │
│ │ context-error.sh │◄─────────────┘ │
│ │ │ │
│ │ categorize_error() │ │
│ │ query_similar_failures() ─────┼──► sw-memory.sh │
│ │ generate_suggested_actions() │ (memory_ranked_search)│
│ │ format_context_error_report() │ │
│ │ format_github_error_comment() │ │
│ │ record_suggestion() ─────┼──► suggestions.jsonl │
│ │ mark_suggestion_resolved() │ │
│ │ resolve_outstanding_suggestions│ │
│ └─────────────────────────────────┘ │
│ │ │
│ ▼ │
│ sw-pipeline.sh (completion) ── resolve_outstanding_ │
│ suggestions() │
│ │
│ helpers.sh ── emit_event() ──► events.jsonl │
└──────────────────────────────────────────────────────────────┘

Dependencies flow one direction: callers → context-error → memory + helpers. No circular dependencies.

Interface Contracts

// Error categorization — maps error text to one of 10 known categories
categorize_error(error_msg: string): ErrorCategory
// Returns: "FILE_ACCESS" | "FUNCTION_ERROR" | "SYNTAX_ERROR" | "ASSERTION_FAILURE"
// | "TYPE_ERROR" | "TIMEOUT" | "MEMORY_ERROR" | "NETWORK_ERROR"
// | "RESOURCE_ERROR" | "UNKNOWN"
// Errors: never fails (falls through to "UNKNOWN")
// Memory query with timeout guard
query_similar_failures(error_msg: string, max_results?: number = 3): JSONArray
// Returns: JSON array of similar past failures, or "[]"
// Errors: returns "[]" if memory_ranked_search unavailable, dir missing, or timeout
// Timeout: 5 seconds hard limit via `timeout` command
// Action generation — combines memory fixes + category defaults + stage context
generate_suggested_actions(
 category: ErrorCategory,
 failure_class: string, // unused, reserved for future error-actionability bridge
 stage_id: string,
 similar_json: JSONArray
): string // newline-separated action list (2-4 items)
// Errors: never fails (always produces at least 2 default actions)
// Terminal-formatted 4-section report
format_context_error_report(
 error_msg: string,
 stage_id: string,
 iteration: number,
 goal: string,
 issue_number: string
): string // multi-line formatted report
// Errors: returns partial report on internal failures (each section independent)
// Respects NO_COLOR env var for plain-text output
// GitHub markdown-formatted report
format_github_error_comment(
 error_msg: string,
 stage_id: string,
 iteration: number,
 goal: string,
 issue_number: string
): string // markdown with collapsible <details> for error logs
// Errors: same graceful degradation as terminal format
// Suggestion tracking — append to JSONL
record_suggestion(
 suggestion_id: string,
 category: ErrorCategory,
 stage_id: string,
 actions_json: JSONArray
): void
// Side effects: appends to $ARTIFACTS_DIR/suggestions.jsonl (atomic via tmp+mv)
// emits suggestion.recorded event
// Individual resolution
mark_suggestion_resolved(suggestion_id: string, resolved?: string = "true"): void
// Side effects: atomically rewrites suggestions.jsonl, emits suggestion.resolved event
// Errors: no-op if file missing or ID not found
// Batch resolution on pipeline success
resolve_outstanding_suggestions(): void
// Side effects: marks all unresolved suggestions as resolved (atomic rewrite)
// Errors: no-op if file missing

Data Flow

Error path (pipeline stage fails):

Stage fails
 → pipeline-state.sh:mark_stage_failed()
 → reads last 5 lines from stage log
 → format_context_error_report(log_tail, stage, iteration, goal, issue)
 → categorize_error(log_tail) → category
 → query_similar_failures(log_tail, 3) → [similar matches] (5s timeout)
 → generate_suggested_actions(category, "", stage, similar) → actions
 → assemble 4-section report
 → save_artifact("context-error-{stage}.md", report)
 → record_suggestion(id, category, stage, actions_json)
 → emit_event("error.context_generated")

Daemon failure path (GitHub comment):

Daemon worker fails
 → daemon-failure.sh:daemon_on_failure()
 → format_github_error_comment(log_tail, "pipeline", 0, goal, issue)
 → same internal flow as above, markdown output
 → appended to GitHub issue comment body

Loop iteration path (prompt injection):

Build loop iteration with prior errors
 → loop-iteration.sh:compose_prompt()
 → query_similar_failures(error_lines, 3)
 → categorize_error(error_lines)
 → generate_suggested_actions(category, "", "build", similar)
 → inject "Historical Context" section into Claude prompt

Feedback loop (pipeline success):

Pipeline completes successfully
 → sw-pipeline.sh (completion block)
 → resolve_outstanding_suggestions()
 → rewrites suggestions.jsonl, setting resolved="true" on all null entries

Error Boundaries

Component	Error Source	Handling
`query_similar_failures`	`memory_ranked_search` not loaded	`type` check → return `[]`
`query_similar_failures`	Memory dir doesn't exist	Dir check → return `[]`
`query_similar_failures`	Query takes too long	`timeout 5` → return `[]`
`generate_suggested_actions`	jq parse failure on similar_json	`2>/dev/null` fallback → skip memory-based fix, use category defaults
`record_suggestion`	jq unavailable	`2>/dev/null` on jq call; file may not be written
`record_suggestion`	ARTIFACTS_DIR not writable	`mkdir -p` with fallback; silent failure
`mark_suggestion_resolved`	Concurrent writers	Atomic tmp + mv; last writer wins (acceptable for tracking data)
All callers	context-error.sh not sourced	`type func_name >/dev/null 2>&1` guard before every call
All callers	Any function throws	`2>/dev/null` with fallback wrapping at call sites

Every integration point is guarded with type ... >/dev/null 2>&1 so the system operates identically when context-error.sh is not loaded. No caller can fail due to this module being absent or broken.

Alternatives Considered

Override global error() function — Pros: zero integration work, every error path automatically enriched. Cons: massive blast radius (all error calls would trigger memory queries), 5-second timeout penalty on every error, untestable (global state mutation), would break simple error output in non-pipeline contexts. Rejected because the cost model is inverted: most errors don't need memory context, but all would pay the latency price.
Post-process stderr after stage completion — Pros: no source code changes to existing modules. Cons: requires complex stderr capture (tee + temp files), delays error feedback until after the stage completes (defeats the purpose of real-time context), loses the ability to inject context into loop prompts mid-iteration. Rejected because the loop-iteration use case requires error context during execution, not after.

Implementation Plan

Files created

scripts/lib/context-error.sh — 431-line core library (8 public functions, include guard, version-stamped)
scripts/sw-lib-context-error-test.sh — 52-case test suite covering all functions, edge cases, and event emission

Files modified

scripts/lib/pipeline-state.sh — mark_stage_failed() calls format_context_error_report + record_suggestion (lines ~440-464)
scripts/lib/daemon-failure.sh — daemon_on_failure() calls format_github_error_comment (lines ~371-376)
scripts/lib/loop-iteration.sh — compose_prompt() injects historical error context via query_similar_failures + categorize_error + generate_suggested_actions (lines ~121-132)
scripts/sw-pipeline.sh — Pipeline completion block calls resolve_outstanding_suggestions (line ~2746)
config/event-schema.json — Registers 3 new events: error.context_generated, suggestion.recorded, suggestion.resolved

Dependencies

No new external dependencies
Runtime dependency on jq (with fallbacks for absence)
Optional dependency on memory_ranked_search from sw-memory.sh (graceful degradation)

Risk areas

Memory query latency: Mitigated by 5-second timeout command wrapper. If the memory corpus is very large, queries could approach this limit. Monitor via suggestion.recorded event timestamps.
suggestions.jsonl concurrent access: mark_suggestion_resolved and resolve_outstanding_suggestions both do full-file rewrite via tmp+mv. In theory, two concurrent pipelines could race. Acceptable for tracking/analytics data — not a correctness concern.
Error category accuracy: Regex-based categorization (categorize_error) is necessarily heuristic. The ordered regex cascade means a message matching both "timeout" and "assertion" patterns would be classified by whichever regex appears first. This is acceptable because the category drives suggestions, not control flow.

Validation Criteria

scripts/lib/context-error.sh loads without error and exports 8 public functions
categorize_error correctly classifies all 10 error categories via regex patterns
query_similar_failures returns [] when memory system is unavailable (no crash, no hang)
query_similar_failures respects 5-second timeout (tested via mock)
format_context_error_report produces all 4 sections: What Failed, Why, Similar Past Issues, Suggested Actions
format_context_error_report includes stage name, iteration count, issue number, and goal in output
format_github_error_comment produces valid markdown with <details> blocks
record_suggestion appends valid JSONL and emits suggestion.recorded event
mark_suggestion_resolved atomically updates the correct entry and emits suggestion.resolved event
resolve_outstanding_suggestions batch-resolves unresolved entries without touching already-resolved ones
All 4 integration points use type ... >/dev/null 2>&1 guards (zero impact when library not loaded)
All 52 test cases pass in scripts/sw-lib-context-error-test.sh
Full test suite (npm test) passes with no regressions
NO_COLOR environment variable produces plain-text output (no Unicode box-drawing)
New events registered in config/event-schema.json with correct required/optional fields

Pipeline Design 189

Decision

Component Diagram

Interface Contracts

Data Flow

Error Boundaries

Alternatives Considered

Implementation Plan

Files created

Files modified

Dependencies

Risk areas

Validation Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally