Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 170

Seth Ford edited this page Mar 1, 2026 · 1 revision

Now I have a thorough understanding of the codebase. Let me produce the ADR.

Design: Dashboard Real-Time Health & Anomaly Visualization

Context

Shipwright's dashboard (Bun WebSocket server + vanilla TypeScript frontend) currently serves 13 tab views via a reactive StoreView pattern. FleetState is computed server-side every 2s and broadcast to up to 50 WebSocket clients. The dashboard already displays cost, DORA grades, pipeline status, and agent health — but these are scattered across Overview, Metrics, and Insights tabs. There is no unified health-at-a-glance surface.

Issue #170 requests: a composite health score (0-100) with trend, anomaly alerts with drill-down, cost burn gauge, stage progress vs historical avg, and DORA cards — all live-updating via WebSocket.

Constraints:

  • Server is a single 5800-line server.ts (Bun runtime, SQLite + JSONL fallback)
  • Frontend uses no framework — raw DOM manipulation with View interface (init/render/destroy)
  • All CSS uses --var design tokens (--cyan, --abyss, --rose, etc.)
  • Tab panels are pre-rendered in HTML, toggled by switchTab()
  • FleetState broadcast uses JSON string deduplication — payload bloat impacts all clients
  • Bash 3.2 compatibility required for test scripts

Decision

Add a dedicated "Health" tab (Approach B) with server-side health score computation piggybacking on the existing getFleetState() cycle. Heavy data (7-day trend, anomaly details, stage history) is served via 3 new REST endpoints fetched lazily on tab open — NOT stuffed into the WebSocket payload.

Key architectural choices:

  1. Health score computed in TypeScript on the server, not by shelling out to sw-pipeline-vitals.sh. This avoids subprocess overhead in the 2s broadcast loop. The four signals (momentum 25%, convergence 35%, budget 20%, error maturity 20%) mirror the vitals engine weights.

  2. FleetState extended with a small health? optional field (~200 bytes) containing only score, verdict, signals, and activePipelines. Trend and anomaly data are NOT in FleetState.

  3. Three REST endpoints for on-demand data: /api/health/trend, /api/health/anomalies, /api/health/stages. Each is cached server-side (60s TTL for trend, 30s for anomalies/stages).

  4. Header health badge provides at-a-glance status without switching tabs. Clicking navigates to the Health tab.

  5. All new types are additive and optional — zero risk to existing 13 views or their tests.

Component Diagram

┌─────────────────────────────────────────────────────────────────┐
│ server.ts │
│ │
│ getFleetState() │
│ ├── readEvents() ─── events.jsonl / SQLite │
│ ├── readDaemonState() ─── daemon-state.json / SQLite │
│ ├── getCostInfo() ─── costs.json + budget.json │
│ ├── calculateDoraGrades() ─── 7-day event scan │
│ └── computeHealthScore() ─── NEW: momentum/convergence/ │
│ │ budget/errorMaturity │
│ └── returns HealthInfo (appended to FleetState.health) │
│ │
│ REST endpoints (lazy-fetched): │
│ GET /api/health/trend ─── getHealthTrend(days) │
│ GET /api/health/anomalies ─── getAnomalies(events) │
│ GET /api/health/stages ─── getStageProgress(events, jobs) │
│ │
│ broadcastToClients(fleetState) ── every 2s via WebSocket │
└────────────┬──────────────────────┬─────────────────────────────┘
 │ WS push │ REST responses
 ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Browser Client │
│ │
│ ws.ts ──onmessage──► store.set("fleetState", data) │
│ │ │
│ ├──► header.ts: renderHealthBadge() │
│ │ (score + verdict color) │
│ │ │
│ └──► health.ts (if active tab): │
│ ├── renderHealthGauge(health) │
│ ├── renderCostBurnGauge(cost) │
│ └── renderDoraCards(dora) │
│ │
│ api.ts (on tab open): │
│ fetchHealthTrend() ──► renderTrendSparkline(points) │
│ fetchAnomalies() ──► renderAnomalyAlerts(anomalies) │
│ fetchStageProgress() ──► renderStageProgress(stages) │
└─────────────────────────────────────────────────────────────────┘

Interface Contracts

Server-Side (server.ts)

// Pure function — no I/O, no subprocess
function computeHealthScore(
 events: DaemonEvent[],
 daemonState: DaemonState,
 costInfo: CostInfo
): HealthInfo;
// Errors: returns { score: 100, verdict: "green", ... } when no data (healthy idle)
// File I/O (reads progress snapshots), cached 60s
function getHealthTrend(days: number): { points: Array<{ ts: string; score: number; verdict: string }> };
// Errors: returns { points: [] } on missing/corrupt files
// Reads events, compares to baselines using EMA
function getAnomalies(events: DaemonEvent[]): { anomalies: AnomalyAlert[] };
// Errors: returns { anomalies: [] } on computation failure
// Reads active jobs + historical events
function getStageProgress(events: DaemonEvent[], pipelines: PipelineInfo[]): { stages: StageProgressInfo[] };
// Errors: returns { stages: [] } on empty input

Shared Types (api.ts)

export interface HealthInfo {
 score: number; // 0-100, clamped
 verdict: "green" | "yellow" | "red" | "critical";
 signals: {
 momentum: number; // 0-100
 convergence: number; // 0-100
 budget: number; // 0-100
 errorMaturity: number; // 0-100
 };
 activePipelines: number;
}
export interface AnomalyAlert {
 id: string;
 metric: string;
 value: number;
 baseline: number;
 severity: "warning" | "critical";
 rootCause: string;
 factors: string[];
 actions: string[];
 ts: string;
 issue?: number;
}
export interface StageProgressInfo {
 pipelineIssue: number;
 stage: string;
 currentDuration_s: number;
 avgDuration_s: number;
 count: number;
 status: "on-track" | "slow" | "fast";
}
// Extended (additive):
export interface FleetState {
 /* ... existing fields unchanged ... */
 health?: HealthInfo; // NEW — optional
}
export type TabId = /* existing 13 */ | "health";

Frontend View (health.ts)

export const healthView: View = {
 init(): void; // Fetches trend, anomalies, stages via REST
 render(state: FleetState): void; // Updates gauge, cost, DORA from WS data
 destroy(): void; // Removes click listeners on anomaly cards
};
// Pure render functions (DOM manipulation):
function renderHealthGauge(el: HTMLElement, health: HealthInfo): void;
function renderTrendSparkline(el: HTMLElement, points: TrendPoint[]): void;
function renderAnomalyAlerts(el: HTMLElement, anomalies: AnomalyAlert[]): void;
function renderCostBurnGauge(el: HTMLElement, cost: CostInfo): void;
function renderStageProgress(el: HTMLElement, stages: StageProgressInfo[]): void;
function renderDoraCards(el: HTMLElement, dora: DoraGrades): void;

API Client Extensions (api.ts)

export async function fetchHealthTrend(days?: number): Promise<{ points: TrendPoint[] }>;
export async function fetchAnomalies(): Promise<{ anomalies: AnomalyAlert[] }>;
export async function fetchStageProgress(): Promise<{ stages: StageProgressInfo[] }>;
// All follow existing pattern: catch → return default empty response

Data Flow

1. REAL-TIME (every 2s):
 server: getFleetState()
 → computeHealthScore(events, daemonState, costInfo)
 → FleetState.health = { score, verdict, signals, activePipelines }
 → broadcastToClients(fleetState)
 → ws.onmessage → store.set("fleetState")
 → header: renderHealthBadge(state.health)
 → health tab (if active): renderHealthGauge, renderCostBurnGauge, renderDoraCards
2. LAZY (on tab open, cached):
 health.init()
 → fetchHealthTrend(7) → GET /api/health/trend?days=7
 → server: getHealthTrend(7) reads ~/.shipwright/progress/issue-*.json
 → returns { points: [...] } → renderTrendSparkline()
 → fetchAnomalies() → GET /api/health/anomalies
 → server: getAnomalies(events) compares stage durations/failures vs EMA baselines
 → returns { anomalies: [...] } → renderAnomalyAlerts()
 → fetchStageProgress() → GET /api/health/stages
 → server: getStageProgress(events, pipelines) compares current vs avg
 → returns { stages: [...] } → renderStageProgress()
3. USER INTERACTION:
 click anomaly card → toggle .anomaly-drilldown visibility (local state)
 click health badge → switchTab("health")

Error Boundaries

Component Error Source Handling
computeHealthScore() Missing events/cost data Returns healthy idle state (score=100, verdict="green")
getHealthTrend() Missing/corrupt progress files Returns { points: [] } — sparkline shows empty state
getAnomalies() No baseline data Returns { anomalies: [] } — alerts section shows "No anomalies"
getStageProgress() No active pipelines Returns { stages: [] } — progress section shows empty state
REST fetch failures Network/auth errors API client catches, returns default response; view shows last known data
WebSocket disconnect Network loss Existing reconnect logic preserves last FleetState; health badge shows "—"
healthView.init() Any throw Caught by existing tab error boundary in router.ts; shows retry button
healthView.render() Bad FleetState shape Guard: if (!state.health) return; — skip render, no crash

Alternatives Considered

  1. Scatter across existing tabs — Pros: no new tab, minimal new code / Cons: no unified health view, user must hop 3 tabs, doesn't meet acceptance criteria for cohesive dashboard. Rejected because it fragments the monitoring experience.

  2. Header overlay/panel — Pros: always visible without tab switching / Cons: cramped header, complex overlay z-index management, hard to fit 5 widgets plus drill-down in a header panel. Rejected because it requires header restructuring with high blast radius.

  3. Shell out to sw-pipeline-vitals.sh — Pros: reuses existing bash logic / Cons: adds ~200ms subprocess per 2s broadcast, doesn't scale with 50 clients, bash output parsing fragile. Rejected for performance — TypeScript computation is synchronous and fast.

  4. Put all data in FleetState — Pros: single data source / Cons: 7-day trend + anomaly details adds ~5KB per broadcast ×ばつ 50 clients ×ばつ every 2s = significant bandwidth waste. Rejected for payload bloat. Only the 200-byte summary goes in FleetState; heavy data is REST-fetched lazily.

Component Hierarchy

×ばつ 4">
App
├── Header (existing)
│ ├── ConnectionDot (existing)
│ ├── CostTicker (existing)
│ └── HealthBadge (NEW) ← state: fleetState.health
│ └── onClick → switchTab("health")
│
├── TabNav (existing, extended)
│ └── "Health" button (NEW)
│
└── Main
 └── HealthView (NEW, tab="health")
 ├── HealthScoreSection ← state: fleetState.health (WS)
 │ ├── HealthGauge (SVG circle)
 │ └── TrendSparkline ← local: healthTrend (REST, cached)
 │
 ├── AnomalyAlertsSection ← local: anomalies (REST, cached)
 │ └── AlertCard[] ← local: expandedAlertId (view state)
 │ └── DrillDown (conditional render)
 │
 ├── BottomRow
 │ ├── CostBurnGauge ← state: fleetState.cost (WS)
 │ └── StageProgressList ← local: stages (REST, cached)
 │ └── StageProgressBar[]
 │
 └── DoraCardsSection ← state: fleetState.dora (WS)
 └── DoraCard ×ばつ 4

State ownership: WS-driven data lives in the global store (health, cost, dora). REST-fetched data (trend, anomalies, stages) lives as module-level variables in health.ts, refreshed on init().

State Management Approach

Data Source Update Frequency Storage
health.score/verdict/signals WebSocket FleetState Every 2s store.fleetState.health
cost WebSocket FleetState Every 2s store.fleetState.cost (existing)
dora WebSocket FleetState Every 2s store.fleetState.dora (existing)
7-day trend points REST /api/health/trend On tab open Module-local in health.ts
Anomaly alerts REST /api/health/anomalies On tab open Module-local in health.ts
Stage progress REST /api/health/stages On tab open Module-local in health.ts
Expanded alert ID User click On interaction Module-local in health.ts

No new store keys needed — REST data is tab-scoped and discarded on destroy().

Accessibility Checklist (WCAG AA)

  • Health gauge: aria-label="Pipeline health score: {score} out of 100, status {verdict}", not color-only
  • Alert severity: text label + icon, not just colored border
  • Anomaly drill-down: <button> wrapper on cards, aria-expanded, keyboard Enter/Space
  • Focus management: visible focus ring (existing --cyan outline), logical tab order
  • Color contrast: all text uses --text-primary on --abyss (passes 4.5:1 per existing design)
  • Semantic HTML: <section aria-label>, <h2>, <button>, <ul>/<li> for alerts
  • Live updates: aria-live="polite" on health score container for screen reader announcements
  • Touch targets: all clickable elements min ×ばつ44px

Responsive Breakpoints

Breakpoint Layout
320px (mobile) Single column stack. Gauge shrinks to 120px. DORA cards ×ばつ4 vertical. Alert cards full width.
768px (tablet) 2-column grid: gauge + trend side-by-side, cost + stages side-by-side. DORA cards ×ばつ2.
1024px (desktop) Full designed layout. DORA cards ×ばつ1 row. All sections visible without scroll.
1440px (wide) Wider cards with more sparkline data points. Gauge at full 200px.

Implementation Plan

Files to create

  • dashboard/src/views/health.ts — Health tab view with 6 render functions
  • dashboard/src/views/health.test.ts — Vitest unit tests (9 test cases)

Files to modify

  • dashboard/src/types/api.ts — Add HealthInfo, AnomalyAlert, StageProgressInfo; extend FleetState; extend TabId
  • dashboard/server.ts — Add computeHealthScore(), getHealthTrend(), getAnomalies(), getStageProgress(); extend getFleetState(); register 3 REST endpoints
  • dashboard/src/main.ts — Import and registerView("health", healthView)
  • dashboard/src/core/api.ts — Add fetchHealthTrend(), fetchAnomalies(), fetchStageProgress()
  • dashboard/public/index.html — Add Health tab button + panel
  • dashboard/public/styles.css — Health-specific CSS (~150 lines)
  • dashboard/src/components/header.ts — Add renderHealthBadge() after connection dot
  • scripts/sw-dashboard-e2e-test.sh — Add health endpoint + FleetState verification
  • scripts/sw-server-api-test.sh — Add 3 endpoint tests

Dependencies

  • None new. All data sources (events.jsonl, costs.json, budget.json, progress/) are already read by the server.

Risk areas

  • server.ts size: Already 5800 lines. Adding ~150 lines of health computation is acceptable but approaching the threshold where extraction to a module would help. Monitor.
  • computeHealthScore() in broadcast loop: Must remain synchronous and fast (<5ms). No file I/O, no subprocesses. The REST endpoints handle heavy lifting separately.
  • CSS specificity: Prefix all new classes with health- to avoid collisions with existing 13 views' styles.
  • FleetState deduplication: Adding health to FleetState means the score will change every 2s as pipelines progress, reducing deduplication effectiveness. Acceptable — the score is small and clients need the updates.

Validation Criteria

  • npm run build compiles with zero errors (TypeScript strict mode)
  • npm test passes all existing 102 test suites + new health.test.ts
  • Health tab renders gauge at score=0, score=50, score=100 (boundary cases)
  • Health tab renders correct verdict colors: green (>=75), yellow (>=50), red (>=25), critical (<25)
  • Empty state (no pipelines): score=100, "No active pipelines" message, empty alerts/stages
  • Anomaly drill-down toggles on Enter/Space key (keyboard accessible)
  • FleetState WebSocket message includes health field with valid structure
  • /api/health/trend?days=7 returns { points: [...] } array
  • /api/health/anomalies returns { anomalies: [...] } with required fields per item
  • /api/health/stages returns { stages: [...] } with status classification
  • Header health badge updates every 2s and navigates to Health tab on click
  • Layout responsive at 320px, 768px, 1024px, 1440px (no horizontal overflow)
  • No regressions in existing Overview, Metrics, or Insights tab rendering

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /