Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 187

Seth Ford edited this page Mar 1, 2026 · 1 revision

Now I have a thorough understanding of the codebase. Let me write the ADR.


Design: Build Loop Error Message Actionability Scorer and Auto-Enhancement

Context

The Shipwright build loop (scripts/sw-loop.sh, 3366 lines) captures test failures into error-summary.json via write_error_summary(), then injects those errors into the next Claude iteration prompt via compose_prompt(). Currently, error lines are passed verbatim — vague errors like "FAIL something went wrong" consume iteration budget without giving the agent enough signal to fix the root cause. High-actionability errors (with file paths, line numbers, expected/got detail) resolve faster, but the agent has no way to distinguish or enrich the two.

Constraints:

  • Bash 3.2 compatible (no associative arrays, readarray, ${var,,})
  • set -euo pipefail required
  • Must not touch compose_prompt() — it's a 300-line function with 20 injection points and is the highest-risk modification surface in the loop
  • jq is a required dependency (already in sw-doctor.sh checks)
  • Atomic file writes via tmp+mv (project convention)
  • Library double-source guard pattern (_SW_*_LOADED)

Decision

Approach: JSON enrichment between write and read

Insert a new library scripts/lib/error-actionability.sh that operates on error-summary.json after write_error_summary() and before compose_prompt() reads it. This is a pure data-layer enhancement — compose_prompt() continues to read .error_lines[] unchanged, but those lines now contain richer context.

Component Diagram

 sw-loop.sh main loop
 │
 ┌─────────┴──────────┐
 │ run_test_gate() │
 │ write_error_summary│ ── writes error-summary.json
 │ │ │
 │ ┌───────▼────────┐ │
 │ │ enhance_error_ │ │ ◄── NEW (2 lines added)
 │ │ summary() │ │
 │ └───────┬────────┘ │
 │ │ │
 │ compose_prompt() │ ── reads error-summary.json (unchanged)
 └─────────┬──────────┘
 │
 lib/error-actionability.sh
 ┌─────────────────────────┐
 │ score_error_line() │ ── pure function: string → int
 │ categorize_error() │ ── pure function: string → enum
 │ score_error_summary() │ ── file → aggregate int
 │ enhance_error_line() │ ── string → enriched string
 │ enhance_error_summary() │ ── orchestrator: score, enhance, emit
 └─────────────────────────┘

Data flows inward: the library depends only on jq, git (optional), and emit_event (optional). It has no dependency on sw-loop.sh internals.

Scoring Rubric (0-100, additive)

Signal Points Detection
File path present 25 Regex: path/to/file.ext patterns
Line number present 20 Regex: :123, line N, (N:N)
Specific error type 20 Known types: TypeError, SyntaxError, ENOENT, etc.
Actionable detail 20 Keywords: expected, got, missing, not defined
Fix suggestion 15 Patterns: did you mean, try, consider

Enhancement threshold: score < 70. Lines scoring >= 70 have enough signal for the agent to act. Lines below 70 get:

  1. Category prefix: [syntax], [runtime], [assertion], [dependency], [build], [unknown]
  2. Recently-changed files context (from git diff --name-only HEAD~1) for lines scoring < 45

Interface Contracts

// score_error_line(line: string) → score: int (0-100)
// Pure function. No side effects. No file I/O.
// Precondition: line is a non-empty string
// Postcondition: 0 <= score <= 100
// categorize_error(line: string) → category: "syntax" | "runtime" | "assertion" | "dependency" | "build" | "unknown"
// Pure function. No side effects.
// score_error_summary(error_json_path: string) → aggregate_score: int (0-100)
// Reads file. Returns 100 if file missing/empty/zero errors.
// Error contract: never fails — returns 100 on any I/O error
// enhance_error_line(line: string, log_dir: string) → enhanced_line: string
// May call `git diff` if score < 45 and git is available.
// Returns original line unchanged if score >= 70.
// enhance_error_summary(log_dir: string) → void (modifies error-summary.json in place)
// Main entry point. Reads/writes error-summary.json atomically.
// Emits event: error.actionability_scored { score, error_count, enhanced, iteration }
// Error contract: returns 0 on any failure (graceful no-op)

Data Flow: Error Enhancement

1. Test fails → write_error_summary() creates:
 {
 "iteration": 3,
 "error_count": 2,
 "error_lines": ["FAIL something broke", "Error: test failed"],
 "test_cmd": "npm test"
 }
2. enhance_error_summary() reads it, scores each line:
 - "FAIL something broke" → score 0 (no file, no line, no type, no detail)
 - "Error: test failed" → score 0
3. Aggregate score: 0 (< 70 threshold) → enhance
4. Enhanced JSON written atomically:
 {
 "iteration": 3,
 "error_count": 2,
 "error_lines": [
 "[unknown] FAIL something broke (recently changed: src/app.ts, tests/app.test.ts)",
 "[unknown] Error: test failed (recently changed: src/app.ts, tests/app.test.ts)"
 ],
 "original_error_lines": ["FAIL something broke", "Error: test failed"],
 "actionability_score": 0,
 "score_breakdown": [
 {"line": "FAIL something broke", "score": 0, "category": "unknown"},
 {"line": "Error: test failed", "score": 0, "category": "unknown"}
 ],
 "test_cmd": "npm test"
 }
5. compose_prompt() reads .error_lines[] (now enriched) — no code change needed

Error Boundaries

Component Error Handling Propagation
score_error_line Cannot fail — all regex via grep -q with `
categorize_error Falls through to "unknown" category Never propagates errors
score_error_summary Returns 100 on missing file, bad JSON, zero errors Swallows jq failures via 2>/dev/null || echo "0"
enhance_error_line git diff failures caught with || true Returns original line on any failure
enhance_error_summary Atomic write: tmp file + mv, cleanup on failure. Returns 0 always. Never causes loop iteration to fail
sw-loop.sh integration Guarded with type enhance_error_summary >/dev/null 2>&1 Missing library = no-op, zero impact

Key invariant: The enhancer can never cause a build loop iteration to fail. Every code path returns 0.

Alternatives Considered

  1. Modify compose_prompt() directly — Pros: Single function handles everything, no new library. / Cons: compose_prompt() is 300 lines with 20 injection sections; adding scoring logic makes it harder to test and maintain. The plan explicitly avoids this, and the existing architecture strongly favors library decomposition.

  2. Post-process in the prompt template (string replacement) — Pros: No JSON modification. / Cons: Would require parsing error lines from a heredoc, regex on prompt text is fragile, and the compose_prompt output isn't a structured format to reliably modify.

  3. LLM-based error enhancement (call Claude to rewrite errors) — Pros: Better quality enhancement than regex. / Cons: Adds API cost per iteration, adds latency (300ms+ per call), creates a dependency on Claude availability inside the loop, and violates the project's convention of keeping loop overhead minimal.

  4. Inline scoring in write_error_summary() — Pros: Single function, no new library needed. / Cons: write_error_summary() would grow to handle scoring, categorization, git context, and JSON enrichment — violating single responsibility. Library approach makes unit testing trivial.

Implementation Plan

  • Files to create:

    • scripts/lib/error-actionability.sh — Core library (scorer, categorizer, enhancer)
    • scripts/sw-lib-error-actionability-test.sh — Test suite (scoring, categorization, enhancement, edge cases)
  • Files to modify:

    • scripts/sw-loop.sh — 2 additions: source the library (near other lib sources, ~line 35), call enhance_error_summary (after write_error_summary, ~line 3164)
    • package.json — 1 line: register test suite
  • Dependencies: None new. Uses jq (already required), git (optional, for recently-changed files).

  • Risk areas:

    1. JSON escaping in enhance_error_summary — Error lines may contain quotes, backslashes, special characters. The plan uses sed 's/\\/\\\\/g; s/"/\\"/g' for manual escaping when building line_scores JSON. This is fragile — a line containing single quotes, newlines, or control characters could break the JSON. Mitigation: jq --arg should be used for any string→JSON conversion. The plan's approach of manual string escaping for the line_scores JSON array construction is the primary risk.
    2. git diff HEAD~1 on fresh repos — First commit has no parent. The 2>/dev/null || true guard handles this, but produces empty context.
    3. Performance on large error sets — Each error line spawns 5-7 grep subprocesses for scoring + 5 more for categorization. With 10 error lines, that's ~120 process forks. Acceptable for the loop's iteration cadence (minutes between iterations), but worth noting.
    4. Race condition on error-summary.json — Atomic write (tmp+mv) prevents corruption. No concurrent readers during write window since the loop is single-threaded.

Validation Criteria

  • score_error_line returns 0 for "FAIL something went wrong" (zero signals)
  • score_error_line returns >70 for "TypeError: Cannot read property 'x' of undefined at src/app.ts:42" (file + line + type + detail)
  • categorize_error correctly classifies all 6 categories (syntax, runtime, assertion, dependency, build, unknown)
  • score_error_summary returns 100 for missing/empty/zero-error files
  • enhance_error_summary adds actionability_score, score_breakdown fields to JSON
  • enhance_error_summary replaces error_lines and preserves original_error_lines when score < 70
  • enhance_error_summary does NOT add original_error_lines when score >= 70 (no unnecessary enhancement)
  • Enhanced error lines have [category] prefix
  • Error lines with score < 45 include (recently changed: ...) context
  • Malformed JSON, missing files, empty log dirs all handled gracefully (return 0)
  • emit_event "error.actionability_scored" fires with correct fields
  • sw-loop-test.sh passes with zero regressions after integration
  • Library sources conditionally ([[ -f ... ]] && source)
  • Enhancement call guarded with type check (no-op if library missing)
  • All bash is 3.2 compatible (no associative arrays, readarray, ${var,,})
  • JSON writes use atomic tmp+mv pattern

Sections Not Applicable

Component Hierarchy / State Management / Accessibility Checklist / Responsive Breakpoints: This feature is a bash shell library operating on JSON files in a CLI tool's build loop. There are no frontend components, UI elements, or browser-rendered output. The "frontend issue" designation in the skill guidance does not apply to this feature.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /