Pipeline Design 178

ezigus edited this page Mar 16, 2026 · 3 revisions

Design: Pipeline Cost Forecast and Budget Gate with Early Warning

Context

Shipwright pipelines can cost 5ドル-50ドル+ per run depending on template, model routing, and complexity. Today, cost visibility is retrospective only — operators see spend after the pipeline completes via cost_dashboard() and cost.record events. The existing budget gate in sw-pipeline.sh (lines ~1692-1702) checks remaining budget per-stage during execution, but by the time it triggers, significant cost has already been incurred.

Problem: There is no pre-start cost estimate. Operators cannot make informed go/no-go decisions before committing resources. The daemon can silently exhaust budgets by queuing expensive pipelines.

Constraints:

Bash 3.2 compatible (macOS default) — no associative arrays, no readarray, no ${var,,}
All cost logic lives in scripts/sw-cost.sh; pipeline orchestration in scripts/sw-pipeline.sh
Events flow through ~/.shipwright/events.jsonl (newline-delimited JSON)
Dashboard reads from API endpoints backed by events.jsonl queries
Existing cost_calculate(), cost_remaining_budget(), cost_check_budget(), model pricing, and emit_event() infrastructure are reusable
Templates define per-stage model assignments in templates/pipelines/*.json

Decision

Approach B: New functions in sw-cost.sh + integration hook in sw-pipeline.sh.

Add three functions to sw-cost.sh:

cost_forecast() — estimates total pipeline cost from template stages x model pricing x historical/default durations
cost_forecast_display() — renders a formatted CLI table
cost_record_variance() — emits forecast-vs-actual event at completion

Integrate into sw-pipeline.sh at two points:

Pre-start (after template load, before run_pipeline): display forecast, enforce budget gate
Post-completion (after pipeline.completed event): record variance

Component Diagram

┌──────────────────────────┐ ┌─────────────────────────────┐
│ sw-pipeline.sh │ │ sw-cost.sh │
│ │ │ │
│ pipeline_start() │ │ cost_forecast() │
│ ├─ load_pipeline_cfg │ │ ├─ read template stages │
│ ├─ ─── NEW ──────────▶│───────▶│ ├─ query history │
│ │ forecast + gate │◀───────│ ├─ fallback defaults │
│ ├─ run_pipeline │ │ ├─ cost_calculate() │
│ │ ├─ stage loop │ │ └─ return JSON │
│ │ └─ budget checks │ │ │
│ └─ ─── NEW ──────────▶│───────▶│ cost_forecast_display() │
│ record variance │ │ └─ formatted table │
│ │ │ │
│ --force-start flag │ │ cost_record_variance() │
│ └─ bypass gate │ │ └─ emit_event │
└──────────┬───────────────┘ └──────────┬──────────────────┘
 │ │
 ▼ ▼
┌──────────────────────────┐ ┌─────────────────────────────┐
│ events.jsonl │ │ dashboard/ │
│ cost.forecast │ │ api.ts → /api/costs/ │
│ cost.forecast_variance │ │ forecast │
│ pipeline.completed │◀───────│ metrics.ts → forecast │
│ (existing events) │ │ card + chart │
└──────────────────────────┘ └─────────────────────────────┘

Interface Contracts

# cost_forecast <template_config_path> [complexity]
# Input: Path to template JSON (e.g., templates/pipelines/standard.json)
# Optional complexity score 1-10 (default: 5)
# Output: JSON to stdout
# {
# "total_usd": "6.36",
# "stages": [
# {"id": "intake", "model": "haiku", "est_duration_s": 60, "est_tokens_in": 1200, "est_tokens_out": 600, "est_cost": "0.02"},
# ...
# ],
# "confidence": "low|medium|high",
# "data_points": 12,
# "complexity_multiplier": "1.0"
# }
# Exit: 0 on success, 1 on error (caller wraps in || true)
# Errors: Writes warnings to stderr; never blocks pipeline on failure
# cost_forecast_display <forecast_json>
# Input: JSON string (output of cost_forecast)
# Output: Formatted Unicode table to stdout
# Exit: 0 always (display-only, non-fatal)
# cost_record_variance <forecast_usd> <actual_usd> <template> [issue]
# Input: Forecast total, actual total, template name, optional issue number
# Output: None (emits event to events.jsonl)
# Event: cost.forecast_variance with forecast_usd, actual_usd, variance_usd, variance_pct, template, issue
# Exit: 0 always (non-fatal)
# CLI: shipwright cost forecast --pipeline <name> [--complexity <1-10>] [--json]
# Input: Template name, optional complexity, optional raw JSON flag
# Output: Formatted table (default) or raw JSON (--json)
# Exit: 0 success, 1 template not found
# Pipeline flag: --force-start
# Bypasses forecast budget gate when forecast > remaining budget
# Emits warning via warn() but does not block

// Dashboard API: GET /api/costs/forecast
// Response (200):
interface ForecastResponse {
 recent_forecasts: Array<{
 issue: string;
 template: string;
 forecast_usd: number;
 confidence: "low" | "medium" | "high";
 ts: string;
 }>;
 variance_history: Array<{
 forecast_usd: number;
 actual_usd: number;
 variance_pct: number;
 template: string;
 ts: string;
 }>;
}
// Response (200, no data): { recent_forecasts: [], variance_history: [] }

Data Flow

User runs shipwright pipeline start --issue 178
pipeline_start() loads template config (load_pipeline_config -> templates/pipelines/standard.json)
NEW: cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}" called
- Reads template JSON -> extracts enabled stages + per-stage model assignments
- Queries events.jsonl for stage.completed events matching template -> computes average duration_s per stage (limited to last 1000 lines via tail)
- Falls back to _DEFAULT_STAGE_DURATIONS for stages with no history
- Applies token-rate heuristics per stage type (build: 50in+20out/sec, review: 40in+30out/sec, light: 20in+10out/sec)
- Calls existing cost_calculate() per stage with estimated tokens + model
- Applies complexity multiplier (complexity / 5, normalized around 1.0)
- Counts data points -> confidence: low (<5), medium (5-20), high (>20)
- Returns JSON to stdout
NEW: cost_forecast_display "$FORECAST_JSON" renders table
NEW: Budget gate -- compares forecast.total_usd vs cost_remaining_budget()
- If forecast > remaining and no --force-start: error() + exit 1
- If forecast > remaining and --force-start: warn() + continue
- If unlimited budget or forecast <= remaining: continue
NEW: emit_event "cost.forecast" with forecast_usd, template, confidence, issue
Pipeline executes normally (existing flow unchanged)
NEW: At completion (after pipeline.completed event), cost_record_variance "$FORECAST_USD" "$total_cost" "$PIPELINE_NAME" "$ISSUE_NUMBER"

Error Boundaries

Component	Failure Mode	Behavior
`cost_forecast()`	jq parse error, missing template, events.jsonl missing	Returns empty string; caller skips forecast display + gate
`cost_forecast_display()`	Malformed JSON input	Prints warning, continues
Budget gate	`cost_remaining_budget()` fails	Returns "unlimited", gate skipped
`cost_record_variance()`	`emit_event` fails	Non-fatal, pipeline result unaffected
Historical query	Large events.jsonl (>100MB)	`tail -1000` limits scan; stale data acceptable
Dashboard endpoint	No forecast events exist	Returns empty arrays, UI shows "No forecast data"

Design principle: Fail-open. The forecast system is advisory; it MUST never prevent a pipeline from running due to its own bugs. All new code paths are wrapped in || true or guarded by -n checks.

Token Estimation Heuristics

Stage categories and estimated tokens per second of active execution:

Category	Stages	Input tok/s	Output tok/s	Rationale
Heavy	build	50	20	Extended code generation, iteration
Review	review, compound_quality, audit	40	30	Code reading + detailed feedback
Light	intake, plan, design, pr, merge, deploy, validate, monitor	20	10	Short prompts, structured output
Minimal	test	10	5	Test execution is mostly compute, not tokens

These are calibration starting points. As cost.forecast_variance events accumulate, heuristics can be tuned (future work, not in scope).

Cold-Start Strategy

When no historical stage.completed events exist for a template:

{
 "intake": 60, "plan": 300, "design": 300, "build": 1200,
 "test": 180, "review": 300, "compound_quality": 600, "audit": 120,
 "pr": 60, "merge": 60, "deploy": 120, "validate": 60, "monitor": 300
}

These defaults are conservative (sum: ~53 minutes). Confidence is reported as "low" with data_points: 0, clearly signaling to the operator that the estimate is rough.

Alternatives Considered

Inline in sw-pipeline.sh -- Pros: single file change, no cross-file sourcing / Cons: sw-pipeline.sh is already 3000+ lines; mixes cost logic with orchestration; harder to unit test forecast math independently; violates existing separation (cost logic in sw-cost.sh)
Separate sw-forecast.sh script -- Pros: maximum isolation, independent versioning / Cons: creates a new file unnecessarily; would need to duplicate or re-source pricing and cost_calculate(); more complex wiring for a feature that is fundamentally a cost concern
Functions in sw-cost.sh (chosen) -- Pros: follows existing pattern (all cost logic in one file); reuses cost_calculate(), pricing data, cost_remaining_budget() directly; testable via existing sw-cost-test.sh harness; minimal new wiring (sw-pipeline.sh already sources sw-cost.sh for budget checks) / Cons: sw-cost.sh grows, but the functions are cohesive with existing cost concerns

Implementation Plan

Files to modify (with full paths)

scripts/sw-cost.sh -- Add cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand, default stage duration constants, token-rate heuristics
scripts/sw-pipeline.sh -- Add --force-start flag parsing, pre-start forecast call + budget gate, post-completion variance recording
scripts/sw-cost-test.sh -- Add test cases for forecast functions, variance math, budget gate behavior
dashboard/src/views/metrics.ts -- Add renderCostForecast() section (forecast card + variance chart)
dashboard/src/core/api.ts -- Add fetchForecastData() and /api/costs/forecast endpoint handler

Files to create

None. All logic fits into existing files following established patterns.

Dependencies

None new. Uses existing jq, awk, tail, and Shipwright infrastructure.

Risk areas

Large events.jsonl performance: Mitigated by tail -1000 limit on historical queries
Bash 3.2 compatibility: No associative arrays; use JSON + jq for structured data. Token-rate lookup uses a case statement, not arrays.
Forecast accuracy at low data points: Clearly communicated via confidence level; does not block with --force-start
Variable scope in sw-pipeline.sh: FORECAST_USD must be visible at completion time. Declare at function scope alongside existing TOTAL_INPUT_TOKENS etc.
Template model field availability: Some templates may not specify per-stage models. Fall back to template defaults.model (always present).

Dashboard Component Hierarchy

MetricsView
 renderMetrics() (existing -- unchanged)
 Success Rate Donut
 Duration / Throughput
 Cost Breakdown (by model, stage, issue, budget)
 renderCostForecast() (NEW)
 ForecastCard
 Total estimate ($X.XX)
 Confidence badge (low=gray, medium=yellow, high=green)
 Budget status bar (reuses existing budget bar pattern)
 VarianceChart
 Horizontal bars: forecast vs actual per pipeline
 Color: green (|variance| <= 20%), yellow (20-50%), red (>50%)

State management: api.fetchForecastData() -> stored in component-local state -> rendered. Same pattern as fetchCostBreakdown() in existing metrics view.

Accessibility: Semantic <table> for forecast data. Confidence uses color + text label. Variance bars include aria-label with numeric values. Keyboard-navigable chart tooltips.

Responsive breakpoints: Single-column stack at 320-768px. Two-column (forecast card | variance chart) at 1024px+. Follows existing metrics grid CSS.

Data Pipeline: Schema and Flow

Event Schemas (new events)

// cost.forecast -- emitted before pipeline starts
{
 "ts": "2026年03月16日T08:00:00Z",
 "ts_epoch": 1773897600,
 "type": "cost.forecast",
 "forecast_usd": "6.36",
 "template": "standard",
 "confidence": "medium",
 "data_points": "12",
 "issue": "178"
}
// cost.forecast_variance -- emitted after pipeline completes
{
 "ts": "2026年03月16日T08:45:00Z",
 "ts_epoch": 1773900300,
 "type": "cost.forecast_variance",
 "forecast_usd": "6.36",
 "actual_usd": "7.12",
 "variance_usd": "0.7600",
 "variance_pct": "11.9",
 "template": "standard",
 "issue": "178"
}

Data Flow Diagram

Template JSON ──┐
 ├──> cost_forecast() ──> JSON stdout ──> cost_forecast_display()
events.jsonl ───┘ | |
 (stage.completed | v
 history query) v CLI table output
 emit_event |
 "cost.forecast" ──> events.jsonl |
 | v
 | Budget gate check
 | (cost_remaining_budget)
 | |
 | ┌────────────┴────────────┐
 | | forecast <= budget | forecast > budget
 | | -> proceed | -> --force-start?
 | | | yes: warn + proceed
 | | | no: exit 1
 | └────────────┬────────────┘
 | |
 | run_pipeline()
 | |
 | pipeline.completed
 | |
 | v
 | cost_record_variance()
 | |
 └──────── emit_event <─┘
 "cost.forecast_variance"
 |
 v
 Dashboard API query
 /api/costs/forecast

Failure points: Template not found (-> skip forecast), events.jsonl missing (-> use defaults), jq unavailable (-> skip forecast), budget.json missing (-> "unlimited").

Idempotency Strategy

cost_forecast() is a pure read -- no side effects, safe to call multiple times
emit_event appends to events.jsonl -- duplicate events are tolerable (timestamped, queryable)
cost_record_variance() is fire-and-forget -- duplicate variance events don't corrupt state
No schema migrations needed -- events.jsonl is append-only with no schema versioning

Rollback Plan

Revert the commits touching sw-cost.sh, sw-pipeline.sh, sw-cost-test.sh, dashboard files
No data migration needed -- cost.forecast and cost.forecast_variance events in events.jsonl are inert (nothing reads them after rollback)
No config changes to undo
--force-start flag becomes unrecognized (harmless -- falls through existing parser)

Validation Criteria

cost_forecast returns valid JSON with total_usd, stages[], confidence, data_points for every template in templates/pipelines/
Cold-start (no events.jsonl history) returns confidence "low" with default durations
With seeded stage.completed events, historical averages are used and confidence reflects data point count
cost_forecast_display renders readable table with stage breakdown, total, confidence, and budget status
shipwright cost forecast --pipeline standard works as CLI command
shipwright cost forecast --pipeline standard --json outputs raw JSON
Pipeline start shows forecast table before execution
When forecast > remaining budget: exit 1 with override hint
When forecast > remaining budget + --force-start: warn + proceed
When budget is unlimited: forecast displays, gate skipped
cost.forecast event emitted to events.jsonl with correct fields
cost.forecast_variance event emitted after completion with correct variance math
Dashboard /api/costs/forecast returns forecast and variance data
Dashboard renders forecast card and variance chart
All new functions have tests in sw-cost-test.sh
npm test passes with no regressions
No Bash 3.2 incompatibilities (no associative arrays, no readarray)
Forecast function failure does not block pipeline start (fail-open verified)

Pipeline Design 178

Design: Pipeline Cost Forecast and Budget Gate with Early Warning

Context

Decision

Component Diagram

Interface Contracts

Data Flow

Error Boundaries

Token Estimation Heuristics

Cold-Start Strategy

Alternatives Considered

Implementation Plan

Files to modify (with full paths)

Files to create

Dependencies

Risk areas

Dashboard Component Hierarchy

Data Pipeline: Schema and Flow

Event Schemas (new events)

Data Flow Diagram

Idempotency Strategy

Rollback Plan

Validation Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!