Pipeline Design 187

Seth Ford edited this page Mar 1, 2026 · 1 revision

Now I have a thorough understanding of the codebase. Let me write the ADR.

Design: Build Loop Error Message Actionability Scorer and Auto-Enhancement

Context

The Shipwright build loop (scripts/sw-loop.sh, 3366 lines) captures test failures into error-summary.json via write_error_summary(), then injects those errors into the next Claude iteration prompt via compose_prompt(). Currently, error lines are passed verbatim — vague errors like "FAIL something went wrong" consume iteration budget without giving the agent enough signal to fix the root cause. High-actionability errors (with file paths, line numbers, expected/got detail) resolve faster, but the agent has no way to distinguish or enrich the two.

Constraints:

Bash 3.2 compatible (no associative arrays, readarray, ${var,,})
set -euo pipefail required
Must not touch compose_prompt() — it's a 300-line function with 20 injection points and is the highest-risk modification surface in the loop
jq is a required dependency (already in sw-doctor.sh checks)
Atomic file writes via tmp+mv (project convention)
Library double-source guard pattern (_SW_*_LOADED)

Decision

Approach: JSON enrichment between write and read

Insert a new library scripts/lib/error-actionability.sh that operates on error-summary.json after write_error_summary() and before compose_prompt() reads it. This is a pure data-layer enhancement — compose_prompt() continues to read .error_lines[] unchanged, but those lines now contain richer context.

Component Diagram

 sw-loop.sh main loop
 │
 ┌─────────┴──────────┐
 │ run_test_gate() │
 │ write_error_summary│ ── writes error-summary.json
 │ │ │
 │ ┌───────▼────────┐ │
 │ │ enhance_error_ │ │ ◄── NEW (2 lines added)
 │ │ summary() │ │
 │ └───────┬────────┘ │
 │ │ │
 │ compose_prompt() │ ── reads error-summary.json (unchanged)
 └─────────┬──────────┘
 │
 lib/error-actionability.sh
 ┌─────────────────────────┐
 │ score_error_line() │ ── pure function: string → int
 │ categorize_error() │ ── pure function: string → enum
 │ score_error_summary() │ ── file → aggregate int
 │ enhance_error_line() │ ── string → enriched string
 │ enhance_error_summary() │ ── orchestrator: score, enhance, emit
 └─────────────────────────┘

Data flows inward: the library depends only on jq, git (optional), and emit_event (optional). It has no dependency on sw-loop.sh internals.

Scoring Rubric (0-100, additive)

Signal	Points	Detection
File path present	25	Regex: `path/to/file.ext` patterns
Line number present	20	Regex: `:123`, `line N`, `(N:N)`
Specific error type	20	Known types: `TypeError`, `SyntaxError`, `ENOENT`, etc.
Actionable detail	20	Keywords: `expected`, `got`, `missing`, `not defined`
Fix suggestion	15	Patterns: `did you mean`, `try`, `consider`

Enhancement threshold: score < 70. Lines scoring >= 70 have enough signal for the agent to act. Lines below 70 get:

Category prefix: [syntax], [runtime], [assertion], [dependency], [build], [unknown]
Recently-changed files context (from git diff --name-only HEAD~1) for lines scoring < 45

Interface Contracts

// score_error_line(line: string) → score: int (0-100)
// Pure function. No side effects. No file I/O.
// Precondition: line is a non-empty string
// Postcondition: 0 <= score <= 100
// categorize_error(line: string) → category: "syntax" | "runtime" | "assertion" | "dependency" | "build" | "unknown"
// Pure function. No side effects.
// score_error_summary(error_json_path: string) → aggregate_score: int (0-100)
// Reads file. Returns 100 if file missing/empty/zero errors.
// Error contract: never fails — returns 100 on any I/O error
// enhance_error_line(line: string, log_dir: string) → enhanced_line: string
// May call `git diff` if score < 45 and git is available.
// Returns original line unchanged if score >= 70.
// enhance_error_summary(log_dir: string) → void (modifies error-summary.json in place)
// Main entry point. Reads/writes error-summary.json atomically.
// Emits event: error.actionability_scored { score, error_count, enhanced, iteration }
// Error contract: returns 0 on any failure (graceful no-op)

Data Flow: Error Enhancement

1. Test fails → write_error_summary() creates:
 {
 "iteration": 3,
 "error_count": 2,
 "error_lines": ["FAIL something broke", "Error: test failed"],
 "test_cmd": "npm test"
 }
2. enhance_error_summary() reads it, scores each line:
 - "FAIL something broke" → score 0 (no file, no line, no type, no detail)
 - "Error: test failed" → score 0
3. Aggregate score: 0 (< 70 threshold) → enhance
4. Enhanced JSON written atomically:
 {
 "iteration": 3,
 "error_count": 2,
 "error_lines": [
 "[unknown] FAIL something broke (recently changed: src/app.ts, tests/app.test.ts)",
 "[unknown] Error: test failed (recently changed: src/app.ts, tests/app.test.ts)"
 ],
 "original_error_lines": ["FAIL something broke", "Error: test failed"],
 "actionability_score": 0,
 "score_breakdown": [
 {"line": "FAIL something broke", "score": 0, "category": "unknown"},
 {"line": "Error: test failed", "score": 0, "category": "unknown"}
 ],
 "test_cmd": "npm test"
 }
5. compose_prompt() reads .error_lines[] (now enriched) — no code change needed

Error Boundaries

Component	Error Handling	Propagation
`score_error_line`	Cannot fail — all regex via `grep -q` with `
`categorize_error`	Falls through to `"unknown"` category	Never propagates errors
`score_error_summary`	Returns `100` on missing file, bad JSON, zero errors	Swallows `jq` failures via `2>/dev/null \|\| echo "0"`
`enhance_error_line`	`git diff` failures caught with `\|\| true`	Returns original line on any failure
`enhance_error_summary`	Atomic write: tmp file + mv, cleanup on failure. Returns 0 always.	Never causes loop iteration to fail
`sw-loop.sh` integration	Guarded with `type enhance_error_summary >/dev/null 2>&1`	Missing library = no-op, zero impact

Key invariant: The enhancer can never cause a build loop iteration to fail. Every code path returns 0.

Alternatives Considered

Modify compose_prompt() directly — Pros: Single function handles everything, no new library. / Cons: compose_prompt() is 300 lines with 20 injection sections; adding scoring logic makes it harder to test and maintain. The plan explicitly avoids this, and the existing architecture strongly favors library decomposition.
Post-process in the prompt template (string replacement) — Pros: No JSON modification. / Cons: Would require parsing error lines from a heredoc, regex on prompt text is fragile, and the compose_prompt output isn't a structured format to reliably modify.
LLM-based error enhancement (call Claude to rewrite errors) — Pros: Better quality enhancement than regex. / Cons: Adds API cost per iteration, adds latency (300ms+ per call), creates a dependency on Claude availability inside the loop, and violates the project's convention of keeping loop overhead minimal.
Inline scoring in write_error_summary() — Pros: Single function, no new library needed. / Cons: write_error_summary() would grow to handle scoring, categorization, git context, and JSON enrichment — violating single responsibility. Library approach makes unit testing trivial.

Implementation Plan

Files to create:
- scripts/lib/error-actionability.sh — Core library (scorer, categorizer, enhancer)
- scripts/sw-lib-error-actionability-test.sh — Test suite (scoring, categorization, enhancement, edge cases)
Files to modify:
- scripts/sw-loop.sh — 2 additions: source the library (near other lib sources, ~line 35), call enhance_error_summary (after write_error_summary, ~line 3164)
- package.json — 1 line: register test suite
Dependencies: None new. Uses jq (already required), git (optional, for recently-changed files).
Risk areas:
1. JSON escaping in enhance_error_summary — Error lines may contain quotes, backslashes, special characters. The plan uses sed 's/\\/\\\\/g; s/"/\\"/g' for manual escaping when building line_scores JSON. This is fragile — a line containing single quotes, newlines, or control characters could break the JSON. Mitigation: jq --arg should be used for any string→JSON conversion. The plan's approach of manual string escaping for the line_scores JSON array construction is the primary risk.
2. git diff HEAD~1 on fresh repos — First commit has no parent. The 2>/dev/null || true guard handles this, but produces empty context.
3. Performance on large error sets — Each error line spawns 5-7 grep subprocesses for scoring + 5 more for categorization. With 10 error lines, that's ~120 process forks. Acceptable for the loop's iteration cadence (minutes between iterations), but worth noting.
4. Race condition on error-summary.json — Atomic write (tmp+mv) prevents corruption. No concurrent readers during write window since the loop is single-threaded.

Validation Criteria

score_error_line returns 0 for "FAIL something went wrong" (zero signals)
score_error_line returns >70 for "TypeError: Cannot read property 'x' of undefined at src/app.ts:42" (file + line + type + detail)
categorize_error correctly classifies all 6 categories (syntax, runtime, assertion, dependency, build, unknown)
score_error_summary returns 100 for missing/empty/zero-error files
enhance_error_summary adds actionability_score, score_breakdown fields to JSON
enhance_error_summary replaces error_lines and preserves original_error_lines when score < 70
enhance_error_summary does NOT add original_error_lines when score >= 70 (no unnecessary enhancement)
Enhanced error lines have [category] prefix
Error lines with score < 45 include (recently changed: ...) context
Malformed JSON, missing files, empty log dirs all handled gracefully (return 0)
emit_event "error.actionability_scored" fires with correct fields
sw-loop-test.sh passes with zero regressions after integration
Library sources conditionally ([[ -f ... ]] && source)
Enhancement call guarded with type check (no-op if library missing)
All bash is 3.2 compatible (no associative arrays, readarray, ${var,,})
JSON writes use atomic tmp+mv pattern

Sections Not Applicable

Component Hierarchy / State Management / Accessibility Checklist / Responsive Breakpoints: This feature is a bash shell library operating on JSON files in a CLI tool's build loop. There are no frontend components, UI elements, or browser-rendered output. The "frontend issue" designation in the skill guidance does not apply to this feature.

Pipeline Design 187

Design: Build Loop Error Message Actionability Scorer and Auto-Enhancement

Context

Decision

Approach: JSON enrichment between write and read

Component Diagram

Scoring Rubric (0-100, additive)

Interface Contracts

Data Flow: Error Enhancement

Error Boundaries

Alternatives Considered

Implementation Plan

Validation Criteria

Sections Not Applicable

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!