Pipeline Design 184

Seth Ford edited this page Mar 10, 2026 · 2 revisions

Now I have a thorough understanding of the codebase. Here is the Architecture Decision Record:

Design: Failure Root Cause Classifier with Automated Platform Issue Creation

Context

Pipeline failures today are classified by daemon-failure.sh:classify_failure() into 6 coarse categories (auth_error, api_error, invalid_issue, context_exhaustion, build_failure, unknown). This classification drives retry strategy but provides no root cause analysis, no fix suggestions, no historical learning, and no platform bug auto-filing.

A parallel library scripts/lib/root-cause.sh (427 lines, 25+ tests) already exists on this branch with 7 finer-grained categories, fix suggestions, a learning system (root-causes.jsonl), and platform issue auto-creation. However, nothing calls it — the daemon doesn't source it, there's no CLI entry point, and the dashboard doesn't surface its data.

Constraints:

Bash 3.2 compatibility (no associative arrays, no readarray)
Daemon failure path is hot — must not block or crash on rootcause failures
root-cause.sh is a sourced library (module guard pattern), not a standalone script
Dashboard uses Bun/TypeScript with a pattern of app.get("/api/...") handlers returning Response.json()
All new functions must degrade gracefully when dependencies (jq, gh, events.jsonl) are missing

Decision

Integrate the existing lib/root-cause.sh library into the daemon failure path and surface data through CLI + dashboard. Do not build a new classification system.

Data Flow

Pipeline Fails
 │
 ▼
daemon_on_failure() ←── scripts/lib/daemon-failure.sh
 │
 ├─ classify_failure() ←── existing coarse classifier (unchanged)
 │ │
 │ ▼ failure_class
 │
 ├─ record_failure_class() ←── existing (unchanged)
 │
 ├─ rootcause_main() ◄── NEW ──── scripts/lib/root-cause.sh
 │ │
 │ ├─ rootcause_classify() regex → category + confidence
 │ │ │
 │ │ └─ rootcause_boost_from_history() ◄── NEW
 │ │ reads root-causes.jsonl for pattern frequency
 │ │
 │ ├─ rootcause_suggest_fix() category → actionable text
 │ │
 │ ├─ rootcause_learn() append to root-causes.jsonl
 │ │
 │ └─ rootcause_create_platform_issue() (if platform_bug, conf>70%)
 │ creates GitHub issue via gh CLI
 │
 ├─ emit_event "daemon.root_cause_classified"
 │
 ├─ Enhanced retry comment ←── adds root cause + suggestions
 │
 └─ Enhanced final failure comment ←── adds root cause analysis section
CLI: shipwright root-cause
 │
 ├─ classify <msg> → rootcause_main()
 ├─ analyze → rootcause_analyze_error_log()
 ├─ report → rootcause_report()
 └─ history → rootcause_analyze_history() ◄── NEW
Dashboard: GET /api/root-cause/breakdown
 │
 └─ Reads ~/.shipwright/optimization/root-causes.jsonl
 │
 └─ Aggregates by category, daily, trends
 │
 └─ Insights tab: bar chart + trend badge + recent errors

Component Diagram

┌─────────────────────────────────────────────────────────────┐
│ DAEMON LAYER │
│ sw-daemon.sh │
│ └─ lib/daemon-failure.sh │
│ ├─ classify_failure() [existing, unchanged] │
│ ├─ record_failure_class() [existing, unchanged] │
│ └─ rootcause integration [NEW: ~30 lines] │
│ wraps rootcause_main() in || true guard │
└──────────────────────┬──────────────────────────────────────┘
 │ sources
┌──────────────────────▼──────────────────────────────────────┐
│ CLASSIFIER LIBRARY │
│ lib/root-cause.sh [existing: 427 lines, adding ~80] │
│ ├─ rootcause_classify() [existing + boost] │
│ ├─ rootcause_suggest_fix() [existing, unchanged] │
│ ├─ rootcause_learn() [existing, unchanged] │
│ ├─ rootcause_create_platform_issue() [existing] │
│ ├─ rootcause_analyze_error_log() [existing, unchanged] │
│ ├─ rootcause_report() [existing, unchanged] │
│ ├─ rootcause_main() [existing, unchanged] │
│ ├─ rootcause_analyze_history() [NEW] │
│ └─ rootcause_boost_from_history()[NEW] │
└──────────────────────┬──────────────────────────────────────┘
 │ reads/writes
┌──────────────────────▼──────────────────────────────────────┐
│ DATA LAYER │
│ ~/.shipwright/optimization/root-causes.jsonl │
│ { category, confidence, message, recorded_at } │
│ ~/.shipwright/events.jsonl │
│ daemon.root_cause_classified events │
│ .claude/pipeline-artifacts/error-log.jsonl │
│ raw error entries from PostToolUse hook │
└──────────────────────┬──────────────────────────────────────┘
 │ read by
┌──────────────────────▼──────────────────────────────────────┐
│ DASHBOARD LAYER │
│ dashboard/server.ts │
│ └─ GET /api/root-cause/breakdown [NEW endpoint] │
│ dashboard/src/types/api.ts │
│ └─ RootCauseBreakdown interface [NEW type] │
│ dashboard/src/core/api.ts │
│ └─ fetchRootCauseBreakdown() [NEW wrapper] │
│ dashboard/src/views/insights.ts │
│ └─ Root cause breakdown card [NEW visualization] │
└─────────────────────────────────────────────────────────────┘
 │ dispatches to
┌──────────────────────▼──────────────────────────────────────┐
│ CLI LAYER │
│ scripts/sw │
│ └─ root-cause) dispatch [NEW: 1 line] │
│ scripts/sw-root-cause.sh [NEW: ~80 lines] │
│ └─ classify | analyze | report | history │
└─────────────────────────────────────────────────────────────┘

Interface Contracts

Bash — rootcause_classify(error_message, stage, exit_code) → JSON

Input: error_message: string, stage: string ("build"|"test"|...), exit_code: string
Output: {"category": "code_bug"|"infra_issue"|"rate_limit"|"context_exhaustion"|
 "platform_bug"|"config_error"|"external_dep",
 "confidence": 0-100,
 "evidence": string[],
 "suggested_action": string}
Error: Returns {"category":"unknown","confidence":0,...} on empty input — never fails

Bash — rootcause_analyze_history() → JSON

Input: none (reads ~/.shipwright/optimization/root-causes.jsonl)
Output: {"total": number, "categories": {"code_bug": N, ...}, "trends": {...}}
Error: Returns {"total":0,"categories":{},"trends":{}} if file missing

Bash — rootcause_boost_from_history(error_message, current_category) → number

Input: error_message: string, current_category: string
Output: "0" | "5" | "10" (confidence boost amount, printed to stdout)
Error: Returns "0" if learn file missing or grep fails

Bash — rootcause_main(error_message, stage, exit_code) → JSON

Input: error_message: string, stage: string, exit_code: string
Output: {"classification": <classify output>, "fix": <suggest_fix output>}
Error: Returns error message to stderr, exit 1 on empty input
Side effects: appends to root-causes.jsonl, may create GitHub issue

TypeScript — GET /api/root-cause/breakdown?days=30

interface RootCauseBreakdown {
 total: number;
 breakdown: Record<string, number>; // category → count
 daily: Record<string, Record<string, number>>; // "2026-03-09" → {category → count}
 trends: {
 platform_bugs_24h: number;
 platform_bugs_7d: number;
 trend: "increasing" | "stable_or_decreasing" | "no_data";
 };
 top_errors: Array<{
 category: string;
 confidence: number;
 message: string; // truncated to 100 chars
 }>;
}
// Error: 500 {"error": {"code": "INTERNAL_ERROR", "message": "..."}}

TypeScript — fetchRootCauseBreakdown(days?: number) → Promise<RootCauseBreakdown>

Error Boundaries

Component	Error Source	Handling
`daemon-failure.sh` rootcause block	`rootcause_main` crashes	Wrapped in `
`rootcause_classify`	jq not available	Falls back to `echo '[]'` for evidence array
`rootcause_analyze_history`	Missing jsonl file	Returns `{"total":0,...}` — empty-state-safe
`rootcause_boost_from_history`	grep/learn file missing	Returns `"0"` — no boost applied
`rootcause_create_platform_issue`	`NO_GITHUB` set, `gh` missing, API error	Returns 0 (skip) or 1 (failure logged), never crashes caller
Dashboard endpoint	Missing/corrupt jsonl	Returns `{"total":0,"breakdown":{},...}` — frontend renders empty state
Dashboard frontend	API 500	Existing `checkDone()` pattern handles partial failures — card shows "No data"

Key Design Decisions

1. Two classifiers coexist — rootcause supplements, doesn't replace

Context: classify_failure() in daemon-failure.sh (6 categories) drives retry strategy. rootcause_classify() in root-cause.sh (7 categories) provides finer-grained analysis.

Decision: Keep both. The daemon's existing classify_failure determines retry behavior (auth_error → no retry, api_error → 4 retries with long backoff). The root cause classifier runs after as an enrichment layer — it adds category/confidence/suggestions to comments and events but does not alter retry logic.

Alternative rejected: Replace classify_failure with rootcause_classify. This would require mapping 7 rootcause categories to the 6 daemon categories and changing retry strategy — high risk for no immediate benefit.

Consequence: A failure might be classified as api_error by the daemon (retry 4x with 5min backoff) and rate_limit by root cause (for comments/analytics). The categories overlap but are not identical. This is acceptable: retry strategy should be conservative (daemon), while analytics benefit from precision (rootcause).

2. Historical boosting via pattern frequency, not ML

Context: The plan calls for "decision tree trained on historical patterns." In a bash environment with JSONL files, true ML is impractical.

Decision: Use simple frequency-based boosting: if rootcause_boost_from_history() finds that similar error messages (first 30 alphanum chars) have been classified 3+ times, boost confidence by 5%; 10+ times, boost by 10%. This is a lookup, not a model.

Alternative rejected: Claude API call per failure for intelligent classification. Adds latency (2-5s), cost (0ドル.01+/failure), and a dependency on API availability in the failure path — precisely when APIs may be down.

Consequence: Classification accuracy is bounded by regex quality. The historical boost only reinforces existing classifications, it cannot reclassify. This is acceptable for v1 — the learning system captures data that could power a more sophisticated classifier later.

3. Daemon integration is fail-safe by design

Context: The failure handler is critical path — if it crashes, the daemon may leave issues in a broken state.

Decision: The entire rootcause block in daemon_on_failure is wrapped in guards:

type rootcause_main >/dev/null 2>&1 before calling (function may not exist if source failed)
All rootcause calls use 2>/dev/null || echo ""
Variables default to empty/zero: root_cause_category="unknown", root_cause_confidence=0
Comment enhancements use ${var:+...} (only render if non-empty)

Consequence: If root-cause.sh fails to source or any function errors, the daemon behaves exactly as it does today — the rootcause block produces no output and no side effects.

4. Dashboard reads JSONL directly, no database

Context: Root cause data lives in ~/.shipwright/optimization/root-causes.jsonl. The dashboard could import it into SQLite (sw-db.sh) or read it directly.

Decision: Read JSONL directly via Bun.file().text() + line splitting + JSON.parse. Filter and aggregate in-memory.

Alternative rejected: SQLite import. Adds a migration, a new table, and a sync mechanism between jsonl and db. Over-engineered for a file that grows by ~1 entry per pipeline failure.

Consequence: Performance degrades if root-causes.jsonl grows very large. Mitigated by filtering to only the last N days (default 30) and reading with a line limit. At typical failure rates (5-20/day), even 6 months of data is <4000 lines — trivial to parse.

Alternatives Considered

ML-based classifier using Claude API calls — Pros: context-aware, handles novel error patterns, higher accuracy. Cons: adds API cost per failure (0ドル.01+), 2-5s latency in the failure path, fragile when the API itself is the failure cause (rate limits, outages), requires Bash-to-API plumbing. Rejected: over-engineered for v1; the regex classifier covers 7 well-defined categories and the learning system captures data for future sophistication.
Build entirely new classification system from scratch — Pros: clean-slate design, no legacy patterns. Cons: discards 427 lines of working, tested code + 374 lines of passing tests. Violates the principle of building on existing work. The existing library already has classification, learning, issue creation, and reporting — the gap is purely integration. Rejected: wasteful.

Implementation Plan

Files to create: scripts/sw-root-cause.sh (CLI entry point, ~80 lines)
Files to modify:
- scripts/lib/root-cause.sh — add rootcause_analyze_history(), rootcause_boost_from_history(), enhance rootcause_classify() (~80 lines added)
- scripts/lib/daemon-failure.sh — source root-cause.sh, wire rootcause_main into daemon_on_failure(), enhance comments (~50 lines added)
- scripts/sw — add root-cause) dispatch (1 line)
- dashboard/server.ts — add /api/root-cause/breakdown endpoint (~50 lines)
- dashboard/src/types/api.ts — add RootCauseBreakdown interface (~15 lines)
- dashboard/src/core/api.ts — add fetchRootCauseBreakdown() (~2 lines)
- dashboard/src/views/insights.ts — add breakdown card to Insights tab (~60 lines)
- scripts/sw-root-cause-test.sh — add tests for new functions (~50 lines)
Dependencies: None new. Uses existing jq, gh, Bun.
Risk areas:
- daemon_on_failure is the critical integration point — must be fail-safe (|| true wrapping)
- rootcause_learn() prepends entries (newest first via tmp+mv) — rootcause_analyze_history() must account for this ordering when using tail
- rootcause_boost_from_history() uses grep on the first 30 chars of error messages — special regex characters in error messages could cause grep failures; use grep -F (fixed string) not regex
- The evidence array in rootcause_classify() uses a fragile printf + jq -Rs pattern (line 117) — existing, not being changed, but worth noting

Validation Criteria

./scripts/sw-root-cause-test.sh passes all existing 25+ tests plus new tests for rootcause_analyze_history, rootcause_boost_from_history
./scripts/sw-lib-daemon-failure-test.sh passes — daemon integration doesn't break existing failure handling
shipwright root-cause classify "rate limit 429" returns JSON with category: "rate_limit"
shipwright root-cause report produces formatted output when root-causes.jsonl exists
shipwright root-cause history returns valid JSON
When lib/root-cause.sh is absent or fails to source, daemon_on_failure() behaves identically to current behavior (no regression)
Dashboard endpoint GET /api/root-cause/breakdown returns valid JSON matching RootCauseBreakdown schema
Dashboard Insights tab renders breakdown card without errors when data is empty
Platform bugs with confidence >70% emit rootcause.platform_issue_created event (existing behavior, verified not regressed)
daemon.root_cause_classified event emitted on every failure with category, confidence, daemon_class fields
shipwright templates list still shows all pipeline templates (discoverability check)

Pipeline Design 184

Design: Failure Root Cause Classifier with Automated Platform Issue Creation

Context

Decision

Data Flow

Component Diagram

Interface Contracts

Error Boundaries

Key Design Decisions

Alternatives Considered

Implementation Plan

Validation Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!