Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 178

ezigus edited this page Mar 17, 2026 · 3 revisions

Now let me write the ADR with concrete codebase references.

Design: Pipeline cost forecast and budget gate with early warning

Context

Shipwright pipelines can run 12 stages, each consuming model tokens at different rates (Opus at 15ドル/75ドル per M tokens, Sonnet at 3ドル/15,ドル Haiku at 0ドル.25/1ドル.25). Today, estimate_pipeline_cost() in sw-pipeline.sh provides a rough aggregate estimate (~8K input / ~4K output per stage), and cost_check_budget() in sw-cost.sh:197 only checks whether the daily budget is already exceeded — it cannot predict whether a pipeline about to start will blow the budget. Operators discover cost overruns after the fact.

Constraints from the codebase:

  • Bash 3.2 compatibility — no associative arrays, no readarray, no ${var,,}
  • All JSON manipulation via jq --arg (no string interpolation)
  • Events written to ~/.shipwright/events.jsonl via emit_event
  • set -euo pipefail in all scripts; grep -c under pipefail produces double output (use || true + ${var:-0})
  • Dashboard is TypeScript/Bun with vitest; shell tests use lib/test-helpers.sh assertions
  • Existing --ignore-budget flag on pipeline start (line ~460 of sw-pipeline.sh)

Decision

Approach B: Per-stage forecast using template model assignments + historical durations.

The forecast engine lives in sw-cost.sh as new functions, called by sw-pipeline.sh before stage execution begins. This keeps cost logic centralized (single responsibility) and reuses the existing cost_calculate() function for per-model pricing.

Data Flow

 ┌─────────────────────────────┐
 │ Pipeline Template JSON │
 │ (enabled stages + models) │
 └──────────────┬──────────────┘
 │
 ▼
┌──────────────┐ ┌─────────────────────────────┐ ┌──────────────┐
│ events.jsonl │────▶│ cost_forecast() engine │────▶│ forecast.json│
│ (history) │ │ in sw-cost.sh │ │ (artifact) │
└──────────────┘ └──────────────┬──────────────┘ └──────────────┘
 │
 ┌─────────┴─────────┐
 ▼ ▼
 ┌──────────────┐ ┌────────────────┐
 │ Budget Gate │ │ CLI / Dashboard │
 │ (block/warn) │ │ (display) │
 └──────────────┘ └────────────────┘
 │
 ▼
 ┌────────────────────────┐
 │ Pipeline runs stages │
 │ ... │
 │ On completion: │
 │ cost_record_variance() │
 └────────────────────────┘

Component Diagram

┌─────────────────────────────────────────────────────────┐
│ sw-cost.sh │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ cost_calculate() │ │ cost_forecast() │ │
│ │ (existing) │◀─│ - reads template stages │ │
│ └──────────────────┘ │ - queries event history │ │
│ │ - applies complexity mult │ │
│ ┌──────────────────┐ │ - computes confidence │ │
│ │ cost_remaining_ │ └──────────────────────────────┘ │
│ │ budget() (exists) │ │
│ └──────────────────┘ ┌──────────────────────────────┐ │
│ │ cost_forecast_display() │ │
│ │ - renders table to stdout │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ cost_record_variance() │ │
│ │ - emits forecast vs actual │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
 │
 ▼
┌─────────────────────────────────────────────────────────┐
│ sw-pipeline.sh │
│ │
│ pipeline_start(): │
│ 1. load_pipeline_config │
│ 2. cost_forecast → save artifact │
│ 3. budget_gate (block | warn | pass) │
│ 4. run stages... │
│ 5. cost_record_variance on completion │
└─────────────────────────────────────────────────────────┘
 │
 ▼
┌─────────────────────────────────────────────────────────┐
│ dashboard/server.ts │
│ GET /api/costs/forecast — shells to sw cost forecast │
│ GET /api/status — enriched with forecast from artifact │
└─────────────────────────────────────────────────────────┘
 │
 ▼
┌─────────────────────────────────────────────────────────┐
│ dashboard/src/views/pipelines.ts │
│ Queue items display: "Est: 45ドル–60ドル (medium confidence)" │
└─────────────────────────────────────────────────────────┘

Interface Contracts

×ばつ 0.8 high, ×ばつ 0.7 medium, ×ばつ 0.5 low) high_usd: number; // upper bound (total ×ばつ 1.2 high, ×ばつ 1.5 medium, ×ばつ 2.0 low) confidence: "high" | "medium" | "low"; data_points: number; // historical runs used complexity_multiplier: number; stages: Array<{ id: string; // e.g. "build", "review" model: string; // e.g. "sonnet", "opus" est_duration_s: number; est_cost_usd: number; }>; } // cost_record_variance(forecast_usd, actual_usd, confidence, template, issue) → event emitted // No return value; writes to events.jsonl // -- Budget gate return codes (in pipeline_start context) -- // 0 = proceed (under budget or budget unlimited) // 1 = warn (forecast 50–100% of remaining; pipeline proceeds with warning) // 2 = block (forecast.high_usd > remaining AND no --force-start; pipeline exits) // -- Dashboard API -- // GET /api/costs/forecast?pipeline=standard&complexity=5 // Response 200: CostForecast // Response 400: { error: { code: string, message: string } } // -- Dashboard types (additions) -- interface QueueItem { issue: number; title: string; score?: number; estimated_cost?: number; // existing field factors?: unknown; // existing field forecast?: CostForecast; // NEW }">
// -- sw-cost.sh outputs (JSON to stdout) --
// cost_forecast(pipeline_config_path, complexity) → stdout
interface CostForecast {
 total_usd: number; // point estimate
 low_usd: number; // lower bound (total ×ばつ 0.8 high, ×ばつ 0.7 medium, ×ばつ 0.5 low)
 high_usd: number; // upper bound (total ×ばつ 1.2 high, ×ばつ 1.5 medium, ×ばつ 2.0 low)
 confidence: "high" | "medium" | "low";
 data_points: number; // historical runs used
 complexity_multiplier: number;
 stages: Array<{
 id: string; // e.g. "build", "review"
 model: string; // e.g. "sonnet", "opus"
 est_duration_s: number;
 est_cost_usd: number;
 }>;
}
// cost_record_variance(forecast_usd, actual_usd, confidence, template, issue) → event emitted
// No return value; writes to events.jsonl
// -- Budget gate return codes (in pipeline_start context) --
// 0 = proceed (under budget or budget unlimited)
// 1 = warn (forecast 50–100% of remaining; pipeline proceeds with warning)
// 2 = block (forecast.high_usd > remaining AND no --force-start; pipeline exits)
// -- Dashboard API --
// GET /api/costs/forecast?pipeline=standard&complexity=5
// Response 200: CostForecast
// Response 400: { error: { code: string, message: string }}
// -- Dashboard types (additions) --
interface QueueItem {
 issue: number;
 title: string;
 score?: number;
 estimated_cost?: number; // existing field
 factors?: unknown; // existing field
 forecast?: CostForecast; // NEW
}

Error Boundaries

Component Error Handling
cost_forecast() No events.jsonl or empty Falls back to default durations; sets confidence="low"
cost_forecast() Invalid template JSON Returns error JSON {"error": "..."}, pipeline logs warning and skips gate
cost_forecast() jq not available Detected at script top; forecast skipped with warning
Budget gate cost_remaining_budget returns "unlimited" Gate skipped entirely
Budget gate Forecast fails Pipeline proceeds with warning (forecast is advisory, not blocking-critical)
cost_record_variance() Missing forecast data at completion Skipped silently (no-op if PIPELINE_FORECAST_USD unset)
Dashboard endpoint sw cost forecast shell-out fails Returns 500 with error message

Confidence Calibration

Level Data Points Interval Width Rationale
High >= 20 runs ±20% (×ばつ0.8 / ×ばつ1.2) Enough data for stable averages
Medium 5–19 runs ±30-50% (×ばつ0.7 / ×ばつ1.5) Moderate uncertainty
Low < 5 runs ±50-100% (×ばつ0.5 / ×ばつ2.0) Cold start, conservative bounds

Budget Gate Logic (precise)

×ばつ 0.5 → WARN (continue) else → PASS">
remaining = cost_remaining_budget()
if remaining == "unlimited" → PASS
if FORCE_START || IGNORE_BUDGET → PASS (with audit event)
if forecast.high_usd > remaining → BLOCK (exit 2, suggest --force-start)
if forecast.total_usd > remaining ×ばつ 0.5 → WARN (continue)
else → PASS

Historical Data Query

Scan events.jsonl for stage.completed events, extract duration per stage name, compute running averages. Limit scan to tail -1000 lines for performance. Group by stage ID. This reuses the existing event format — no new data collection needed.

Alternatives Considered

  1. Simple multiplier (stage_count ×ばつ flat_rate) — Pros: Trivial, zero dependencies. Cons: Ignores model tier differences (Opus is ×ばつ more expensive than Haiku), ignores stage duration variance (build averages 20min vs intake at 1min), produces estimates so inaccurate they erode trust in the gate. Rejected: too coarse for meaningful go/no-go decisions.

  2. ML regression on historical runs — Pros: Could capture non-linear relationships (e.g., complexity ×ばつ model ×ばつ time-of-day). Cons: Requires training infrastructure, minimum ~100 runs for stable regression, adds Python dependency to a shell-native project, massive over-engineering for current data volume (~dozens of runs). Rejected: future enhancement when data justifies it.

Implementation Plan

Files to modify

File Lines Changed (est.) Purpose
scripts/sw-cost.sh +200 cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand
scripts/sw-pipeline.sh +50 --force-start flag, forecast + budget gate in pipeline_start(), variance at completion
config/event-schema.json +20 cost.forecast and cost.forecast_variance event type definitions
dashboard/server.ts +30 /api/costs/forecast endpoint, forecast in queue enrichment
dashboard/src/types/api.ts +15 CostForecast interface, extend QueueItem
dashboard/src/views/pipelines.ts +15 Forecast display on queued items
scripts/sw-pipeline-test.sh +60 Integration tests for budget gate

Files to create

File Purpose
src/cost-forecast.test.js Unit tests for forecast math and variance tracking

Dependencies

  • None new. Uses existing jq, awk, bash, vitest.

Risk areas

  • events.jsonl scan performance: Mitigated by tail -1000 + grep filter. If file exceeds ~100K lines, consider indexed lookup (future).
  • pipeline_start() is already ~300 lines: Adding forecast + gate adds ~50 lines of sequential logic. Inserted as a discrete block after load_pipeline_config, before state file creation — minimal entanglement with existing flow.
  • Bash 3.2 float arithmetic: All cost math uses awk (already the pattern in cost_calculate()). No bc dependency.
  • Race condition on budget check: Between forecast check and actual spend, another pipeline could start. Acceptable — the gate is advisory, not transactional. --force-start exists as escape valve.

Validation Criteria

  • shipwright cost forecast --pipeline standard --json returns valid CostForecast JSON
  • shipwright cost forecast --pipeline standard renders human-readable table with per-stage breakdown
  • Cold start (empty events.jsonl): forecast uses defaults, shows "low" confidence
  • With 25+ historical stage.completed events: shows "high" confidence with narrow interval
  • Pipeline start displays forecast before executing stages
  • Pipeline blocked when forecast.high_usd > remaining_budget (exit code 2, message includes --force-start hint)
  • --force-start bypasses gate with audit event emitted
  • --ignore-budget also bypasses forecast gate (backward compatible)
  • cost.forecast event emitted at pipeline start
  • cost.forecast_variance event emitted at pipeline completion with forecast/actual/variance fields
  • All existing tests pass (npm test)
  • New unit tests cover: forecast calculation, confidence thresholds at boundaries (4/5/19/20 data points), complexity multiplier scaling, variance recording
  • New integration tests cover: gate blocks over budget, gate warns at 50-100%, --force-start override, variance event in events.jsonl
  • No Bash 4+ features used (verified by shellcheck or manual review)
  • Dashboard /api/costs/forecast returns valid JSON for all template types
  • Dashboard queue view shows forecast inline for queued items

Frontend Sections

Component Hierarchy

pipelines.ts (view)
 └─ renderQueueTable()
 └─ renderQueueRow(item: QueueItem)
 └─ renderForecastBadge(forecast?: CostForecast) // NEW
 - "Est: 45ドル–60ドル (medium confidence)"
 - Color-coded: green (under 50% budget), yellow (50-100%), red (over)

State lives in FleetState.queue[].forecast — fetched from server, no local state management needed. The forecast data flows from GET /api/status through to render.

State Management Approach

No new state stores. Forecast data is embedded in the existing FleetState response from /api/status. The queue enrichment in server.ts reads cost-forecast.json from pipeline artifacts when available. Pure props-down data flow.

Accessibility Checklist

  • Forecast badge uses semantic <span> with aria-label="Estimated cost: 45ドル to 60,ドル medium confidence"
  • Color coding supplemented with text labels (not color-only)
  • Budget warning uses role="alert" for screen reader announcement
  • Table cells use <td> with column headers in <th> (existing pattern)

Responsive Breakpoints

  • 320px: Forecast column hidden; available via row expansion (existing mobile pattern)
  • 768px+: Forecast shown as compact badge: "45ドル–60ドル (M)"
  • 1024px+: Full forecast text: "Est: 45ドル–60ドル (medium confidence, 12 runs)"
  • 1440px+: No change from 1024px

The dashboard already uses a responsive table pattern — forecast column follows the same hide/show behavior as existing optional columns.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /