-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Design 178
Shipwright pipelines can cost 5ドル-50ドル+ per run depending on template, model routing, and complexity. Today, cost visibility is retrospective only — operators see spend after the pipeline completes via cost_dashboard() and cost.record events. The existing budget gate in sw-pipeline.sh (lines ~1692-1702) checks remaining budget per-stage during execution, but by the time it triggers, significant cost has already been incurred.
Problem: There is no pre-start cost estimate. Operators cannot make informed go/no-go decisions before committing resources. The daemon can silently exhaust budgets by queuing expensive pipelines.
Constraints:
- Bash 3.2 compatible (macOS default) — no associative arrays, no
readarray, no${var,,} - All cost logic lives in
scripts/sw-cost.sh; pipeline orchestration inscripts/sw-pipeline.sh - Events flow through
~/.shipwright/events.jsonl(newline-delimited JSON) - Dashboard reads from API endpoints backed by events.jsonl queries
- Existing
cost_calculate(),cost_remaining_budget(),cost_check_budget(), model pricing, andemit_event()infrastructure are reusable - Templates define per-stage model assignments in
templates/pipelines/*.json
Approach B: New functions in sw-cost.sh + integration hook in sw-pipeline.sh.
Add three functions to sw-cost.sh:
-
cost_forecast()— estimates total pipeline cost from template stages x model pricing x historical/default durations -
cost_forecast_display()— renders a formatted CLI table -
cost_record_variance()— emits forecast-vs-actual event at completion
Integrate into sw-pipeline.sh at two points:
-
Pre-start (after template load, before
run_pipeline): display forecast, enforce budget gate -
Post-completion (after
pipeline.completedevent): record variance
┌──────────────────────────┐ ┌─────────────────────────────┐
│ sw-pipeline.sh │ │ sw-cost.sh │
│ │ │ │
│ pipeline_start() │ │ cost_forecast() │
│ ├─ load_pipeline_cfg │ │ ├─ read template stages │
│ ├─ ─── NEW ──────────▶│───────▶│ ├─ query history │
│ │ forecast + gate │◀───────│ ├─ fallback defaults │
│ ├─ run_pipeline │ │ ├─ cost_calculate() │
│ │ ├─ stage loop │ │ └─ return JSON │
│ │ └─ budget checks │ │ │
│ └─ ─── NEW ──────────▶│───────▶│ cost_forecast_display() │
│ record variance │ │ └─ formatted table │
│ │ │ │
│ --force-start flag │ │ cost_record_variance() │
│ └─ bypass gate │ │ └─ emit_event │
└──────────┬───────────────┘ └──────────┬──────────────────┘
│ │
▼ ▼
┌──────────────────────────┐ ┌─────────────────────────────┐
│ events.jsonl │ │ dashboard/ │
│ cost.forecast │ │ api.ts → /api/costs/ │
│ cost.forecast_variance │ │ forecast │
│ pipeline.completed │◀───────│ metrics.ts → forecast │
│ (existing events) │ │ card + chart │
└──────────────────────────┘ └─────────────────────────────┘
# cost_forecast <template_config_path> [complexity] # Input: Path to template JSON (e.g., templates/pipelines/standard.json) # Optional complexity score 1-10 (default: 5) # Output: JSON to stdout # { # "total_usd": "6.36", # "stages": [ # {"id": "intake", "model": "haiku", "est_duration_s": 60, "est_tokens_in": 1200, "est_tokens_out": 600, "est_cost": "0.02"}, # ... # ], # "confidence": "low|medium|high", # "data_points": 12, # "complexity_multiplier": "1.0" # } # Exit: 0 on success, 1 on error (caller wraps in || true) # Errors: Writes warnings to stderr; never blocks pipeline on failure # cost_forecast_display <forecast_json> # Input: JSON string (output of cost_forecast) # Output: Formatted Unicode table to stdout # Exit: 0 always (display-only, non-fatal) # cost_record_variance <forecast_usd> <actual_usd> <template> [issue] # Input: Forecast total, actual total, template name, optional issue number # Output: None (emits event to events.jsonl) # Event: cost.forecast_variance with forecast_usd, actual_usd, variance_usd, variance_pct, template, issue # Exit: 0 always (non-fatal) # CLI: shipwright cost forecast --pipeline <name> [--complexity <1-10>] [--json] # Input: Template name, optional complexity, optional raw JSON flag # Output: Formatted table (default) or raw JSON (--json) # Exit: 0 success, 1 template not found # Pipeline flag: --force-start # Bypasses forecast budget gate when forecast > remaining budget # Emits warning via warn() but does not block
// Dashboard API: GET /api/costs/forecast // Response (200): interface ForecastResponse { recent_forecasts: Array<{ issue: string; template: string; forecast_usd: number; confidence: "low" | "medium" | "high"; ts: string; }>; variance_history: Array<{ forecast_usd: number; actual_usd: number; variance_pct: number; template: string; ts: string; }>; } // Response (200, no data): { recent_forecasts: [], variance_history: [] }
- User runs
shipwright pipeline start --issue 178 -
pipeline_start()loads template config (load_pipeline_config->templates/pipelines/standard.json) -
NEW:
cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}"called- Reads template JSON -> extracts enabled stages + per-stage model assignments
- Queries
events.jsonlforstage.completedevents matching template -> computes averageduration_sper stage (limited to last 1000 lines viatail) - Falls back to
_DEFAULT_STAGE_DURATIONSfor stages with no history - Applies token-rate heuristics per stage type (build: 50in+20out/sec, review: 40in+30out/sec, light: 20in+10out/sec)
- Calls existing
cost_calculate()per stage with estimated tokens + model - Applies complexity multiplier (
complexity / 5, normalized around 1.0) - Counts data points -> confidence: low (<5), medium (5-20), high (>20)
- Returns JSON to stdout
-
NEW:
cost_forecast_display "$FORECAST_JSON"renders table -
NEW: Budget gate -- compares
forecast.total_usdvscost_remaining_budget()- If forecast > remaining and no
--force-start:error()+ exit 1 - If forecast > remaining and
--force-start:warn()+ continue - If unlimited budget or forecast <= remaining: continue
- If forecast > remaining and no
-
NEW:
emit_event "cost.forecast"with forecast_usd, template, confidence, issue - Pipeline executes normally (existing flow unchanged)
-
NEW: At completion (after
pipeline.completedevent),cost_record_variance "$FORECAST_USD" "$total_cost" "$PIPELINE_NAME" "$ISSUE_NUMBER"
| Component | Failure Mode | Behavior |
|---|---|---|
cost_forecast() |
jq parse error, missing template, events.jsonl missing | Returns empty string; caller skips forecast display + gate |
cost_forecast_display() |
Malformed JSON input | Prints warning, continues |
| Budget gate |
cost_remaining_budget() fails |
Returns "unlimited", gate skipped |
cost_record_variance() |
emit_event fails |
Non-fatal, pipeline result unaffected |
| Historical query | Large events.jsonl (>100MB) |
tail -1000 limits scan; stale data acceptable |
| Dashboard endpoint | No forecast events exist | Returns empty arrays, UI shows "No forecast data" |
Design principle: Fail-open. The forecast system is advisory; it MUST never prevent a pipeline from running due to its own bugs. All new code paths are wrapped in || true or guarded by -n checks.
Stage categories and estimated tokens per second of active execution:
| Category | Stages | Input tok/s | Output tok/s | Rationale |
|---|---|---|---|---|
| Heavy | build | 50 | 20 | Extended code generation, iteration |
| Review | review, compound_quality, audit | 40 | 30 | Code reading + detailed feedback |
| Light | intake, plan, design, pr, merge, deploy, validate, monitor | 20 | 10 | Short prompts, structured output |
| Minimal | test | 10 | 5 | Test execution is mostly compute, not tokens |
These are calibration starting points. As cost.forecast_variance events accumulate, heuristics can be tuned (future work, not in scope).
When no historical stage.completed events exist for a template:
{
"intake": 60, "plan": 300, "design": 300, "build": 1200,
"test": 180, "review": 300, "compound_quality": 600, "audit": 120,
"pr": 60, "merge": 60, "deploy": 120, "validate": 60, "monitor": 300
}These defaults are conservative (sum: ~53 minutes). Confidence is reported as "low" with data_points: 0, clearly signaling to the operator that the estimate is rough.
-
Inline in sw-pipeline.sh -- Pros: single file change, no cross-file sourcing / Cons: sw-pipeline.sh is already 3000+ lines; mixes cost logic with orchestration; harder to unit test forecast math independently; violates existing separation (cost logic in sw-cost.sh)
-
Separate sw-forecast.sh script -- Pros: maximum isolation, independent versioning / Cons: creates a new file unnecessarily; would need to duplicate or re-source pricing and
cost_calculate(); more complex wiring for a feature that is fundamentally a cost concern -
Functions in sw-cost.sh (chosen) -- Pros: follows existing pattern (all cost logic in one file); reuses
cost_calculate(), pricing data,cost_remaining_budget()directly; testable via existing sw-cost-test.sh harness; minimal new wiring (sw-pipeline.sh already sources sw-cost.sh for budget checks) / Cons: sw-cost.sh grows, but the functions are cohesive with existing cost concerns
-
scripts/sw-cost.sh-- Addcost_forecast(),cost_forecast_display(),cost_record_variance(),forecastCLI subcommand, default stage duration constants, token-rate heuristics -
scripts/sw-pipeline.sh-- Add--force-startflag parsing, pre-start forecast call + budget gate, post-completion variance recording -
scripts/sw-cost-test.sh-- Add test cases for forecast functions, variance math, budget gate behavior -
dashboard/src/views/metrics.ts-- AddrenderCostForecast()section (forecast card + variance chart) -
dashboard/src/core/api.ts-- AddfetchForecastData()and/api/costs/forecastendpoint handler
None. All logic fits into existing files following established patterns.
None new. Uses existing jq, awk, tail, and Shipwright infrastructure.
-
Large events.jsonl performance: Mitigated by
tail -1000limit on historical queries - Bash 3.2 compatibility: No associative arrays; use JSON + jq for structured data. Token-rate lookup uses a case statement, not arrays.
-
Forecast accuracy at low data points: Clearly communicated via confidence level; does not block with
--force-start -
Variable scope in sw-pipeline.sh:
FORECAST_USDmust be visible at completion time. Declare at function scope alongside existingTOTAL_INPUT_TOKENSetc. -
Template model field availability: Some templates may not specify per-stage models. Fall back to template
defaults.model(always present).
MetricsView
renderMetrics() (existing -- unchanged)
Success Rate Donut
Duration / Throughput
Cost Breakdown (by model, stage, issue, budget)
renderCostForecast() (NEW)
ForecastCard
Total estimate ($X.XX)
Confidence badge (low=gray, medium=yellow, high=green)
Budget status bar (reuses existing budget bar pattern)
VarianceChart
Horizontal bars: forecast vs actual per pipeline
Color: green (|variance| <= 20%), yellow (20-50%), red (>50%)
State management: api.fetchForecastData() -> stored in component-local state -> rendered. Same pattern as fetchCostBreakdown() in existing metrics view.
Accessibility: Semantic <table> for forecast data. Confidence uses color + text label. Variance bars include aria-label with numeric values. Keyboard-navigable chart tooltips.
Responsive breakpoints: Single-column stack at 320-768px. Two-column (forecast card | variance chart) at 1024px+. Follows existing metrics grid CSS.
// cost.forecast -- emitted before pipeline starts { "ts": "2026年03月16日T08:00:00Z", "ts_epoch": 1773897600, "type": "cost.forecast", "forecast_usd": "6.36", "template": "standard", "confidence": "medium", "data_points": "12", "issue": "178" } // cost.forecast_variance -- emitted after pipeline completes { "ts": "2026年03月16日T08:45:00Z", "ts_epoch": 1773900300, "type": "cost.forecast_variance", "forecast_usd": "6.36", "actual_usd": "7.12", "variance_usd": "0.7600", "variance_pct": "11.9", "template": "standard", "issue": "178" }
Template JSON ──┐
├──> cost_forecast() ──> JSON stdout ──> cost_forecast_display()
events.jsonl ───┘ | |
(stage.completed | v
history query) v CLI table output
emit_event |
"cost.forecast" ──> events.jsonl |
| v
| Budget gate check
| (cost_remaining_budget)
| |
| ┌────────────┴────────────┐
| | forecast <= budget | forecast > budget
| | -> proceed | -> --force-start?
| | | yes: warn + proceed
| | | no: exit 1
| └────────────┬────────────┘
| |
| run_pipeline()
| |
| pipeline.completed
| |
| v
| cost_record_variance()
| |
└──────── emit_event <─┘
"cost.forecast_variance"
|
v
Dashboard API query
/api/costs/forecast
Failure points: Template not found (-> skip forecast), events.jsonl missing (-> use defaults), jq unavailable (-> skip forecast), budget.json missing (-> "unlimited").
-
cost_forecast()is a pure read -- no side effects, safe to call multiple times -
emit_eventappends to events.jsonl -- duplicate events are tolerable (timestamped, queryable) -
cost_record_variance()is fire-and-forget -- duplicate variance events don't corrupt state - No schema migrations needed -- events.jsonl is append-only with no schema versioning
- Revert the commits touching sw-cost.sh, sw-pipeline.sh, sw-cost-test.sh, dashboard files
- No data migration needed --
cost.forecastandcost.forecast_varianceevents in events.jsonl are inert (nothing reads them after rollback) - No config changes to undo
-
--force-startflag becomes unrecognized (harmless -- falls through existing parser)
-
cost_forecastreturns valid JSON withtotal_usd,stages[],confidence,data_pointsfor every template intemplates/pipelines/ - Cold-start (no events.jsonl history) returns confidence "low" with default durations
- With seeded
stage.completedevents, historical averages are used and confidence reflects data point count -
cost_forecast_displayrenders readable table with stage breakdown, total, confidence, and budget status -
shipwright cost forecast --pipeline standardworks as CLI command -
shipwright cost forecast --pipeline standard --jsonoutputs raw JSON - Pipeline start shows forecast table before execution
- When forecast > remaining budget: exit 1 with override hint
- When forecast > remaining budget +
--force-start: warn + proceed - When budget is unlimited: forecast displays, gate skipped
-
cost.forecastevent emitted to events.jsonl with correct fields -
cost.forecast_varianceevent emitted after completion with correct variance math - Dashboard
/api/costs/forecastreturns forecast and variance data - Dashboard renders forecast card and variance chart
- All new functions have tests in sw-cost-test.sh
-
npm testpasses with no regressions - No Bash 3.2 incompatibilities (no associative arrays, no readarray)
- Forecast function failure does not block pipeline start (fail-open verified)