Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 178

ezigus edited this page Mar 16, 2026 · 3 revisions

Design: Pipeline Cost Forecast and Budget Gate with Early Warning

Context

Shipwright pipelines can cost 5ドル-50ドル+ per run depending on template, model routing, and complexity. Today, cost visibility is retrospective only — operators see spend after the pipeline completes via cost_dashboard() and cost.record events. The existing budget gate in sw-pipeline.sh (lines ~1692-1702) checks remaining budget per-stage during execution, but by the time it triggers, significant cost has already been incurred.

Problem: There is no pre-start cost estimate. Operators cannot make informed go/no-go decisions before committing resources. The daemon can silently exhaust budgets by queuing expensive pipelines.

Constraints:

  • Bash 3.2 compatible (macOS default) — no associative arrays, no readarray, no ${var,,}
  • All cost logic lives in scripts/sw-cost.sh; pipeline orchestration in scripts/sw-pipeline.sh
  • Events flow through ~/.shipwright/events.jsonl (newline-delimited JSON)
  • Dashboard reads from API endpoints backed by events.jsonl queries
  • Existing cost_calculate(), cost_remaining_budget(), cost_check_budget(), model pricing, and emit_event() infrastructure are reusable
  • Templates define per-stage model assignments in templates/pipelines/*.json

Decision

Approach B: New functions in sw-cost.sh + integration hook in sw-pipeline.sh.

Add three functions to sw-cost.sh:

  1. cost_forecast() — estimates total pipeline cost from template stages x model pricing x historical/default durations
  2. cost_forecast_display() — renders a formatted CLI table
  3. cost_record_variance() — emits forecast-vs-actual event at completion

Integrate into sw-pipeline.sh at two points:

  • Pre-start (after template load, before run_pipeline): display forecast, enforce budget gate
  • Post-completion (after pipeline.completed event): record variance

Component Diagram

┌──────────────────────────┐ ┌─────────────────────────────┐
│ sw-pipeline.sh │ │ sw-cost.sh │
│ │ │ │
│ pipeline_start() │ │ cost_forecast() │
│ ├─ load_pipeline_cfg │ │ ├─ read template stages │
│ ├─ ─── NEW ──────────▶│───────▶│ ├─ query history │
│ │ forecast + gate │◀───────│ ├─ fallback defaults │
│ ├─ run_pipeline │ │ ├─ cost_calculate() │
│ │ ├─ stage loop │ │ └─ return JSON │
│ │ └─ budget checks │ │ │
│ └─ ─── NEW ──────────▶│───────▶│ cost_forecast_display() │
│ record variance │ │ └─ formatted table │
│ │ │ │
│ --force-start flag │ │ cost_record_variance() │
│ └─ bypass gate │ │ └─ emit_event │
└──────────┬───────────────┘ └──────────┬──────────────────┘
 │ │
 ▼ ▼
┌──────────────────────────┐ ┌─────────────────────────────┐
│ events.jsonl │ │ dashboard/ │
│ cost.forecast │ │ api.ts → /api/costs/ │
│ cost.forecast_variance │ │ forecast │
│ pipeline.completed │◀───────│ metrics.ts → forecast │
│ (existing events) │ │ card + chart │
└──────────────────────────┘ └─────────────────────────────┘

Interface Contracts

# cost_forecast <template_config_path> [complexity]
# Input: Path to template JSON (e.g., templates/pipelines/standard.json)
# Optional complexity score 1-10 (default: 5)
# Output: JSON to stdout
# {
# "total_usd": "6.36",
# "stages": [
# {"id": "intake", "model": "haiku", "est_duration_s": 60, "est_tokens_in": 1200, "est_tokens_out": 600, "est_cost": "0.02"},
# ...
# ],
# "confidence": "low|medium|high",
# "data_points": 12,
# "complexity_multiplier": "1.0"
# }
# Exit: 0 on success, 1 on error (caller wraps in || true)
# Errors: Writes warnings to stderr; never blocks pipeline on failure
# cost_forecast_display <forecast_json>
# Input: JSON string (output of cost_forecast)
# Output: Formatted Unicode table to stdout
# Exit: 0 always (display-only, non-fatal)
# cost_record_variance <forecast_usd> <actual_usd> <template> [issue]
# Input: Forecast total, actual total, template name, optional issue number
# Output: None (emits event to events.jsonl)
# Event: cost.forecast_variance with forecast_usd, actual_usd, variance_usd, variance_pct, template, issue
# Exit: 0 always (non-fatal)
# CLI: shipwright cost forecast --pipeline <name> [--complexity <1-10>] [--json]
# Input: Template name, optional complexity, optional raw JSON flag
# Output: Formatted table (default) or raw JSON (--json)
# Exit: 0 success, 1 template not found
# Pipeline flag: --force-start
# Bypasses forecast budget gate when forecast > remaining budget
# Emits warning via warn() but does not block
// Dashboard API: GET /api/costs/forecast
// Response (200):
interface ForecastResponse {
 recent_forecasts: Array<{
 issue: string;
 template: string;
 forecast_usd: number;
 confidence: "low" | "medium" | "high";
 ts: string;
 }>;
 variance_history: Array<{
 forecast_usd: number;
 actual_usd: number;
 variance_pct: number;
 template: string;
 ts: string;
 }>;
}
// Response (200, no data): { recent_forecasts: [], variance_history: [] }

Data Flow

  1. User runs shipwright pipeline start --issue 178
  2. pipeline_start() loads template config (load_pipeline_config -> templates/pipelines/standard.json)
  3. NEW: cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}" called
    • Reads template JSON -> extracts enabled stages + per-stage model assignments
    • Queries events.jsonl for stage.completed events matching template -> computes average duration_s per stage (limited to last 1000 lines via tail)
    • Falls back to _DEFAULT_STAGE_DURATIONS for stages with no history
    • Applies token-rate heuristics per stage type (build: 50in+20out/sec, review: 40in+30out/sec, light: 20in+10out/sec)
    • Calls existing cost_calculate() per stage with estimated tokens + model
    • Applies complexity multiplier (complexity / 5, normalized around 1.0)
    • Counts data points -> confidence: low (<5), medium (5-20), high (>20)
    • Returns JSON to stdout
  4. NEW: cost_forecast_display "$FORECAST_JSON" renders table
  5. NEW: Budget gate -- compares forecast.total_usd vs cost_remaining_budget()
    • If forecast > remaining and no --force-start: error() + exit 1
    • If forecast > remaining and --force-start: warn() + continue
    • If unlimited budget or forecast <= remaining: continue
  6. NEW: emit_event "cost.forecast" with forecast_usd, template, confidence, issue
  7. Pipeline executes normally (existing flow unchanged)
  8. NEW: At completion (after pipeline.completed event), cost_record_variance "$FORECAST_USD" "$total_cost" "$PIPELINE_NAME" "$ISSUE_NUMBER"

Error Boundaries

Component Failure Mode Behavior
cost_forecast() jq parse error, missing template, events.jsonl missing Returns empty string; caller skips forecast display + gate
cost_forecast_display() Malformed JSON input Prints warning, continues
Budget gate cost_remaining_budget() fails Returns "unlimited", gate skipped
cost_record_variance() emit_event fails Non-fatal, pipeline result unaffected
Historical query Large events.jsonl (>100MB) tail -1000 limits scan; stale data acceptable
Dashboard endpoint No forecast events exist Returns empty arrays, UI shows "No forecast data"

Design principle: Fail-open. The forecast system is advisory; it MUST never prevent a pipeline from running due to its own bugs. All new code paths are wrapped in || true or guarded by -n checks.

Token Estimation Heuristics

Stage categories and estimated tokens per second of active execution:

Category Stages Input tok/s Output tok/s Rationale
Heavy build 50 20 Extended code generation, iteration
Review review, compound_quality, audit 40 30 Code reading + detailed feedback
Light intake, plan, design, pr, merge, deploy, validate, monitor 20 10 Short prompts, structured output
Minimal test 10 5 Test execution is mostly compute, not tokens

These are calibration starting points. As cost.forecast_variance events accumulate, heuristics can be tuned (future work, not in scope).

Cold-Start Strategy

When no historical stage.completed events exist for a template:

{
 "intake": 60, "plan": 300, "design": 300, "build": 1200,
 "test": 180, "review": 300, "compound_quality": 600, "audit": 120,
 "pr": 60, "merge": 60, "deploy": 120, "validate": 60, "monitor": 300
}

These defaults are conservative (sum: ~53 minutes). Confidence is reported as "low" with data_points: 0, clearly signaling to the operator that the estimate is rough.

Alternatives Considered

  1. Inline in sw-pipeline.sh -- Pros: single file change, no cross-file sourcing / Cons: sw-pipeline.sh is already 3000+ lines; mixes cost logic with orchestration; harder to unit test forecast math independently; violates existing separation (cost logic in sw-cost.sh)

  2. Separate sw-forecast.sh script -- Pros: maximum isolation, independent versioning / Cons: creates a new file unnecessarily; would need to duplicate or re-source pricing and cost_calculate(); more complex wiring for a feature that is fundamentally a cost concern

  3. Functions in sw-cost.sh (chosen) -- Pros: follows existing pattern (all cost logic in one file); reuses cost_calculate(), pricing data, cost_remaining_budget() directly; testable via existing sw-cost-test.sh harness; minimal new wiring (sw-pipeline.sh already sources sw-cost.sh for budget checks) / Cons: sw-cost.sh grows, but the functions are cohesive with existing cost concerns

Implementation Plan

Files to modify (with full paths)

  1. scripts/sw-cost.sh -- Add cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand, default stage duration constants, token-rate heuristics
  2. scripts/sw-pipeline.sh -- Add --force-start flag parsing, pre-start forecast call + budget gate, post-completion variance recording
  3. scripts/sw-cost-test.sh -- Add test cases for forecast functions, variance math, budget gate behavior
  4. dashboard/src/views/metrics.ts -- Add renderCostForecast() section (forecast card + variance chart)
  5. dashboard/src/core/api.ts -- Add fetchForecastData() and /api/costs/forecast endpoint handler

Files to create

None. All logic fits into existing files following established patterns.

Dependencies

None new. Uses existing jq, awk, tail, and Shipwright infrastructure.

Risk areas

  • Large events.jsonl performance: Mitigated by tail -1000 limit on historical queries
  • Bash 3.2 compatibility: No associative arrays; use JSON + jq for structured data. Token-rate lookup uses a case statement, not arrays.
  • Forecast accuracy at low data points: Clearly communicated via confidence level; does not block with --force-start
  • Variable scope in sw-pipeline.sh: FORECAST_USD must be visible at completion time. Declare at function scope alongside existing TOTAL_INPUT_TOKENS etc.
  • Template model field availability: Some templates may not specify per-stage models. Fall back to template defaults.model (always present).

Dashboard Component Hierarchy

MetricsView
 renderMetrics() (existing -- unchanged)
 Success Rate Donut
 Duration / Throughput
 Cost Breakdown (by model, stage, issue, budget)
 renderCostForecast() (NEW)
 ForecastCard
 Total estimate ($X.XX)
 Confidence badge (low=gray, medium=yellow, high=green)
 Budget status bar (reuses existing budget bar pattern)
 VarianceChart
 Horizontal bars: forecast vs actual per pipeline
 Color: green (|variance| <= 20%), yellow (20-50%), red (>50%)

State management: api.fetchForecastData() -> stored in component-local state -> rendered. Same pattern as fetchCostBreakdown() in existing metrics view.

Accessibility: Semantic <table> for forecast data. Confidence uses color + text label. Variance bars include aria-label with numeric values. Keyboard-navigable chart tooltips.

Responsive breakpoints: Single-column stack at 320-768px. Two-column (forecast card | variance chart) at 1024px+. Follows existing metrics grid CSS.

Data Pipeline: Schema and Flow

Event Schemas (new events)

// cost.forecast -- emitted before pipeline starts
{
 "ts": "2026年03月16日T08:00:00Z",
 "ts_epoch": 1773897600,
 "type": "cost.forecast",
 "forecast_usd": "6.36",
 "template": "standard",
 "confidence": "medium",
 "data_points": "12",
 "issue": "178"
}
// cost.forecast_variance -- emitted after pipeline completes
{
 "ts": "2026年03月16日T08:45:00Z",
 "ts_epoch": 1773900300,
 "type": "cost.forecast_variance",
 "forecast_usd": "6.36",
 "actual_usd": "7.12",
 "variance_usd": "0.7600",
 "variance_pct": "11.9",
 "template": "standard",
 "issue": "178"
}

Data Flow Diagram

Template JSON ──┐
 ├──> cost_forecast() ──> JSON stdout ──> cost_forecast_display()
events.jsonl ───┘ | |
 (stage.completed | v
 history query) v CLI table output
 emit_event |
 "cost.forecast" ──> events.jsonl |
 | v
 | Budget gate check
 | (cost_remaining_budget)
 | |
 | ┌────────────┴────────────┐
 | | forecast <= budget | forecast > budget
 | | -> proceed | -> --force-start?
 | | | yes: warn + proceed
 | | | no: exit 1
 | └────────────┬────────────┘
 | |
 | run_pipeline()
 | |
 | pipeline.completed
 | |
 | v
 | cost_record_variance()
 | |
 └──────── emit_event <─┘
 "cost.forecast_variance"
 |
 v
 Dashboard API query
 /api/costs/forecast

Failure points: Template not found (-> skip forecast), events.jsonl missing (-> use defaults), jq unavailable (-> skip forecast), budget.json missing (-> "unlimited").

Idempotency Strategy

  • cost_forecast() is a pure read -- no side effects, safe to call multiple times
  • emit_event appends to events.jsonl -- duplicate events are tolerable (timestamped, queryable)
  • cost_record_variance() is fire-and-forget -- duplicate variance events don't corrupt state
  • No schema migrations needed -- events.jsonl is append-only with no schema versioning

Rollback Plan

  1. Revert the commits touching sw-cost.sh, sw-pipeline.sh, sw-cost-test.sh, dashboard files
  2. No data migration needed -- cost.forecast and cost.forecast_variance events in events.jsonl are inert (nothing reads them after rollback)
  3. No config changes to undo
  4. --force-start flag becomes unrecognized (harmless -- falls through existing parser)

Validation Criteria

  • cost_forecast returns valid JSON with total_usd, stages[], confidence, data_points for every template in templates/pipelines/
  • Cold-start (no events.jsonl history) returns confidence "low" with default durations
  • With seeded stage.completed events, historical averages are used and confidence reflects data point count
  • cost_forecast_display renders readable table with stage breakdown, total, confidence, and budget status
  • shipwright cost forecast --pipeline standard works as CLI command
  • shipwright cost forecast --pipeline standard --json outputs raw JSON
  • Pipeline start shows forecast table before execution
  • When forecast > remaining budget: exit 1 with override hint
  • When forecast > remaining budget + --force-start: warn + proceed
  • When budget is unlimited: forecast displays, gate skipped
  • cost.forecast event emitted to events.jsonl with correct fields
  • cost.forecast_variance event emitted after completion with correct variance math
  • Dashboard /api/costs/forecast returns forecast and variance data
  • Dashboard renders forecast card and variance chart
  • All new functions have tests in sw-cost-test.sh
  • npm test passes with no regressions
  • No Bash 3.2 incompatibilities (no associative arrays, no readarray)
  • Forecast function failure does not block pipeline start (fail-open verified)

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /