-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 184
Now I have a thorough understanding of the codebase. Here is the Architecture Decision Record:
Pipeline failures today are classified by daemon-failure.sh:classify_failure() into 6 coarse categories (auth_error, api_error, invalid_issue, context_exhaustion, build_failure, unknown). This classification drives retry strategy but provides no root cause analysis, no fix suggestions, no historical learning, and no platform bug auto-filing.
A parallel library scripts/lib/root-cause.sh (427 lines, 25+ tests) already exists on this branch with 7 finer-grained categories, fix suggestions, a learning system (root-causes.jsonl), and platform issue auto-creation. However, nothing calls it — the daemon doesn't source it, there's no CLI entry point, and the dashboard doesn't surface its data.
Constraints:
- Bash 3.2 compatibility (no associative arrays, no
readarray) - Daemon failure path is hot — must not block or crash on rootcause failures
-
root-cause.shis a sourced library (module guard pattern), not a standalone script - Dashboard uses Bun/TypeScript with a pattern of
app.get("/api/...")handlers returningResponse.json() - All new functions must degrade gracefully when dependencies (jq, gh, events.jsonl) are missing
Integrate the existing lib/root-cause.sh library into the daemon failure path and surface data through CLI + dashboard. Do not build a new classification system.
Pipeline Fails
│
▼
daemon_on_failure() ←── scripts/lib/daemon-failure.sh
│
├─ classify_failure() ←── existing coarse classifier (unchanged)
│ │
│ ▼ failure_class
│
├─ record_failure_class() ←── existing (unchanged)
│
├─ rootcause_main() ◄── NEW ──── scripts/lib/root-cause.sh
│ │
│ ├─ rootcause_classify() regex → category + confidence
│ │ │
│ │ └─ rootcause_boost_from_history() ◄── NEW
│ │ reads root-causes.jsonl for pattern frequency
│ │
│ ├─ rootcause_suggest_fix() category → actionable text
│ │
│ ├─ rootcause_learn() append to root-causes.jsonl
│ │
│ └─ rootcause_create_platform_issue() (if platform_bug, conf>70%)
│ creates GitHub issue via gh CLI
│
├─ emit_event "daemon.root_cause_classified"
│
├─ Enhanced retry comment ←── adds root cause + suggestions
│
└─ Enhanced final failure comment ←── adds root cause analysis section
CLI: shipwright root-cause
│
├─ classify <msg> → rootcause_main()
├─ analyze → rootcause_analyze_error_log()
├─ report → rootcause_report()
└─ history → rootcause_analyze_history() ◄── NEW
Dashboard: GET /api/root-cause/breakdown
│
└─ Reads ~/.shipwright/optimization/root-causes.jsonl
│
└─ Aggregates by category, daily, trends
│
└─ Insights tab: bar chart + trend badge + recent errors
┌─────────────────────────────────────────────────────────────┐
│ DAEMON LAYER │
│ sw-daemon.sh │
│ └─ lib/daemon-failure.sh │
│ ├─ classify_failure() [existing, unchanged] │
│ ├─ record_failure_class() [existing, unchanged] │
│ └─ rootcause integration [NEW: ~30 lines] │
│ wraps rootcause_main() in || true guard │
└──────────────────────┬──────────────────────────────────────┘
│ sources
┌──────────────────────▼──────────────────────────────────────┐
│ CLASSIFIER LIBRARY │
│ lib/root-cause.sh [existing: 427 lines, adding ~80] │
│ ├─ rootcause_classify() [existing + boost] │
│ ├─ rootcause_suggest_fix() [existing, unchanged] │
│ ├─ rootcause_learn() [existing, unchanged] │
│ ├─ rootcause_create_platform_issue() [existing] │
│ ├─ rootcause_analyze_error_log() [existing, unchanged] │
│ ├─ rootcause_report() [existing, unchanged] │
│ ├─ rootcause_main() [existing, unchanged] │
│ ├─ rootcause_analyze_history() [NEW] │
│ └─ rootcause_boost_from_history()[NEW] │
└──────────────────────┬──────────────────────────────────────┘
│ reads/writes
┌──────────────────────▼──────────────────────────────────────┐
│ DATA LAYER │
│ ~/.shipwright/optimization/root-causes.jsonl │
│ { category, confidence, message, recorded_at } │
│ ~/.shipwright/events.jsonl │
│ daemon.root_cause_classified events │
│ .claude/pipeline-artifacts/error-log.jsonl │
│ raw error entries from PostToolUse hook │
└──────────────────────┬──────────────────────────────────────┘
│ read by
┌──────────────────────▼──────────────────────────────────────┐
│ DASHBOARD LAYER │
│ dashboard/server.ts │
│ └─ GET /api/root-cause/breakdown [NEW endpoint] │
│ dashboard/src/types/api.ts │
│ └─ RootCauseBreakdown interface [NEW type] │
│ dashboard/src/core/api.ts │
│ └─ fetchRootCauseBreakdown() [NEW wrapper] │
│ dashboard/src/views/insights.ts │
│ └─ Root cause breakdown card [NEW visualization] │
└─────────────────────────────────────────────────────────────┘
│ dispatches to
┌──────────────────────▼──────────────────────────────────────┐
│ CLI LAYER │
│ scripts/sw │
│ └─ root-cause) dispatch [NEW: 1 line] │
│ scripts/sw-root-cause.sh [NEW: ~80 lines] │
│ └─ classify | analyze | report | history │
└─────────────────────────────────────────────────────────────┘
Bash — rootcause_classify(error_message, stage, exit_code) → JSON
Input: error_message: string, stage: string ("build"|"test"|...), exit_code: string
Output: {"category": "code_bug"|"infra_issue"|"rate_limit"|"context_exhaustion"|
"platform_bug"|"config_error"|"external_dep",
"confidence": 0-100,
"evidence": string[],
"suggested_action": string}
Error: Returns {"category":"unknown","confidence":0,...} on empty input — never fails
Bash — rootcause_analyze_history() → JSON
Input: none (reads ~/.shipwright/optimization/root-causes.jsonl)
Output: {"total": number, "categories": {"code_bug": N, ...}, "trends": {...}}
Error: Returns {"total":0,"categories":{},"trends":{}} if file missing
Bash — rootcause_boost_from_history(error_message, current_category) → number
Input: error_message: string, current_category: string
Output: "0" | "5" | "10" (confidence boost amount, printed to stdout)
Error: Returns "0" if learn file missing or grep fails
Bash — rootcause_main(error_message, stage, exit_code) → JSON
Input: error_message: string, stage: string, exit_code: string
Output: {"classification": <classify output>, "fix": <suggest_fix output>}
Error: Returns error message to stderr, exit 1 on empty input
Side effects: appends to root-causes.jsonl, may create GitHub issue
TypeScript — GET /api/root-cause/breakdown?days=30
interface RootCauseBreakdown { total: number; breakdown: Record<string, number>; // category → count daily: Record<string, Record<string, number>>; // "2026-03-09" → {category → count} trends: { platform_bugs_24h: number; platform_bugs_7d: number; trend: "increasing" | "stable_or_decreasing" | "no_data"; }; top_errors: Array<{ category: string; confidence: number; message: string; // truncated to 100 chars }>; } // Error: 500 {"error": {"code": "INTERNAL_ERROR", "message": "..."}}
TypeScript — fetchRootCauseBreakdown(days?: number) → Promise<RootCauseBreakdown>
| Component | Error Source | Handling |
|---|---|---|
daemon-failure.sh rootcause block |
rootcause_main crashes |
Wrapped in ` |
rootcause_classify |
jq not available | Falls back to echo '[]' for evidence array |
rootcause_analyze_history |
Missing jsonl file | Returns {"total":0,...} — empty-state-safe |
rootcause_boost_from_history |
grep/learn file missing | Returns "0" — no boost applied |
rootcause_create_platform_issue |
NO_GITHUB set, gh missing, API error |
Returns 0 (skip) or 1 (failure logged), never crashes caller |
| Dashboard endpoint | Missing/corrupt jsonl | Returns {"total":0,"breakdown":{},...} — frontend renders empty state |
| Dashboard frontend | API 500 | Existing checkDone() pattern handles partial failures — card shows "No data" |
1. Two classifiers coexist — rootcause supplements, doesn't replace
Context: classify_failure() in daemon-failure.sh (6 categories) drives retry strategy. rootcause_classify() in root-cause.sh (7 categories) provides finer-grained analysis.
Decision: Keep both. The daemon's existing classify_failure determines retry behavior (auth_error → no retry, api_error → 4 retries with long backoff). The root cause classifier runs after as an enrichment layer — it adds category/confidence/suggestions to comments and events but does not alter retry logic.
Alternative rejected: Replace classify_failure with rootcause_classify. This would require mapping 7 rootcause categories to the 6 daemon categories and changing retry strategy — high risk for no immediate benefit.
Consequence: A failure might be classified as api_error by the daemon (retry 4x with 5min backoff) and rate_limit by root cause (for comments/analytics). The categories overlap but are not identical. This is acceptable: retry strategy should be conservative (daemon), while analytics benefit from precision (rootcause).
2. Historical boosting via pattern frequency, not ML
Context: The plan calls for "decision tree trained on historical patterns." In a bash environment with JSONL files, true ML is impractical.
Decision: Use simple frequency-based boosting: if rootcause_boost_from_history() finds that similar error messages (first 30 alphanum chars) have been classified 3+ times, boost confidence by 5%; 10+ times, boost by 10%. This is a lookup, not a model.
Alternative rejected: Claude API call per failure for intelligent classification. Adds latency (2-5s), cost (0ドル.01+/failure), and a dependency on API availability in the failure path — precisely when APIs may be down.
Consequence: Classification accuracy is bounded by regex quality. The historical boost only reinforces existing classifications, it cannot reclassify. This is acceptable for v1 — the learning system captures data that could power a more sophisticated classifier later.
3. Daemon integration is fail-safe by design
Context: The failure handler is critical path — if it crashes, the daemon may leave issues in a broken state.
Decision: The entire rootcause block in daemon_on_failure is wrapped in guards:
-
type rootcause_main >/dev/null 2>&1before calling (function may not exist if source failed) - All rootcause calls use
2>/dev/null || echo "" - Variables default to empty/zero:
root_cause_category="unknown",root_cause_confidence=0 - Comment enhancements use
${var:+...}(only render if non-empty)
Consequence: If root-cause.sh fails to source or any function errors, the daemon behaves exactly as it does today — the rootcause block produces no output and no side effects.
4. Dashboard reads JSONL directly, no database
Context: Root cause data lives in ~/.shipwright/optimization/root-causes.jsonl. The dashboard could import it into SQLite (sw-db.sh) or read it directly.
Decision: Read JSONL directly via Bun.file().text() + line splitting + JSON.parse. Filter and aggregate in-memory.
Alternative rejected: SQLite import. Adds a migration, a new table, and a sync mechanism between jsonl and db. Over-engineered for a file that grows by ~1 entry per pipeline failure.
Consequence: Performance degrades if root-causes.jsonl grows very large. Mitigated by filtering to only the last N days (default 30) and reading with a line limit. At typical failure rates (5-20/day), even 6 months of data is <4000 lines — trivial to parse.
-
ML-based classifier using Claude API calls — Pros: context-aware, handles novel error patterns, higher accuracy. Cons: adds API cost per failure (0ドル.01+), 2-5s latency in the failure path, fragile when the API itself is the failure cause (rate limits, outages), requires Bash-to-API plumbing. Rejected: over-engineered for v1; the regex classifier covers 7 well-defined categories and the learning system captures data for future sophistication.
-
Build entirely new classification system from scratch — Pros: clean-slate design, no legacy patterns. Cons: discards 427 lines of working, tested code + 374 lines of passing tests. Violates the principle of building on existing work. The existing library already has classification, learning, issue creation, and reporting — the gap is purely integration. Rejected: wasteful.
-
Files to create:
scripts/sw-root-cause.sh(CLI entry point, ~80 lines) -
Files to modify:
-
scripts/lib/root-cause.sh— addrootcause_analyze_history(),rootcause_boost_from_history(), enhancerootcause_classify()(~80 lines added) -
scripts/lib/daemon-failure.sh— source root-cause.sh, wire rootcause_main intodaemon_on_failure(), enhance comments (~50 lines added) -
scripts/sw— addroot-cause)dispatch (1 line) -
dashboard/server.ts— add/api/root-cause/breakdownendpoint (~50 lines) -
dashboard/src/types/api.ts— addRootCauseBreakdowninterface (~15 lines) -
dashboard/src/core/api.ts— addfetchRootCauseBreakdown()(~2 lines) -
dashboard/src/views/insights.ts— add breakdown card to Insights tab (~60 lines) -
scripts/sw-root-cause-test.sh— add tests for new functions (~50 lines)
-
- Dependencies: None new. Uses existing jq, gh, Bun.
-
Risk areas:
-
daemon_on_failureis the critical integration point — must be fail-safe (|| truewrapping) -
rootcause_learn()prepends entries (newest first via tmp+mv) —rootcause_analyze_history()must account for this ordering when usingtail -
rootcause_boost_from_history()uses grep on the first 30 chars of error messages — special regex characters in error messages could cause grep failures; usegrep -F(fixed string) not regex - The
evidencearray inrootcause_classify()uses a fragileprintf + jq -Rspattern (line 117) — existing, not being changed, but worth noting
-
-
./scripts/sw-root-cause-test.shpasses all existing 25+ tests plus new tests forrootcause_analyze_history,rootcause_boost_from_history -
./scripts/sw-lib-daemon-failure-test.shpasses — daemon integration doesn't break existing failure handling -
shipwright root-cause classify "rate limit 429"returns JSON withcategory: "rate_limit" -
shipwright root-cause reportproduces formatted output when root-causes.jsonl exists -
shipwright root-cause historyreturns valid JSON - When
lib/root-cause.shis absent or fails to source,daemon_on_failure()behaves identically to current behavior (no regression) - Dashboard endpoint
GET /api/root-cause/breakdownreturns valid JSON matchingRootCauseBreakdownschema - Dashboard Insights tab renders breakdown card without errors when data is empty
- Platform bugs with confidence >70% emit
rootcause.platform_issue_createdevent (existing behavior, verified not regressed) -
daemon.root_cause_classifiedevent emitted on every failure with category, confidence, daemon_class fields -
shipwright templates liststill shows all pipeline templates (discoverability check)