-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 178
Minimum viable change: Add a cost_forecast() function to sw-cost.sh that estimates pipeline cost from template stages ×ばつ model pricing ×ばつ historical duration, display the forecast before pipeline_start() proceeds, and gate on budget. Emit variance events at completion.
Implicit requirements: Need sensible defaults when no historical data exists (cold-start). Confidence intervals require enough historical data points to be meaningful. The --force-start override must bypass the gate cleanly.
Acceptance criteria (from issue): forecast display, budget gate with override, variance event, dashboard display, confidence intervals.
Approach A: Inline in sw-pipeline.sh — Add forecast logic directly into pipeline_start(). Simple but mixes concerns and makes testing harder.
Approach B: New functions in sw-cost.sh + integration point in sw-pipeline.sh — Add cost_forecast() and cost_forecast_display() to sw-cost.sh (where all cost logic lives), call from pipeline_start() after template is loaded. Clean separation, testable, follows existing patterns.
Chosen: Approach B — Minimal blast radius, follows existing code organization (cost logic in sw-cost.sh, pipeline orchestration in sw-pipeline.sh). The dashboard integration adds a new API endpoint and view component.
- Cold-start with no history: Mitigated by hardcoded default durations per stage (based on typical pipeline runs)
-
Inaccurate forecasts blocking legitimate work: Mitigated by
--force-startoverride and configurable gate -
Breaking existing budget checks: Low risk — new functions, existing
cost_check_budget()unchanged - Dashboard changes: Isolated to new view components, no existing views modified
- 3 files modified (sw-cost.sh, sw-pipeline.sh, dashboard metrics.ts)
- 1 file modified for tests (sw-cost-test.sh)
- Reuses existing
cost_calculate(),cost_check_budget(),cost_remaining_budget(), model pricing, and event emission infrastructure - No new dependencies
┌─────────────────────┐ ┌──────────────────────┐
│ sw-pipeline.sh │ │ sw-cost.sh │
│ │ │ │
│ pipeline_start() │────▶│ cost_forecast() │
│ ├─ load config │ │ ├─ count stages │
│ ├─ FORECAST ◀────│─────│ ├─ get history │
│ ├─ budget gate │ │ ├─ calc per-stage │
│ └─ run_pipeline │ │ └─ return estimate │
│ │ │ │
│ pipeline complete │────▶│ cost_record_variance │
│ │ │ └─ emit event │
└─────────────────────┘ └──────────────────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌──────────────────────┐
│ events.jsonl │ │ dashboard/metrics.ts │
│ cost.forecast │ │ renderCostForecast() │
│ cost.forecast_var │ │ forecast card + bar │
└─────────────────────┘ └──────────────────────┘
// cost_forecast <template_config_path> [complexity] // Returns JSON: { total_usd, stages: [{id, model, est_duration_s, est_cost}], confidence, data_points } // confidence: "low" (<5 data points), "medium" (5-20), "high" (>20) // cost_forecast_display <forecast_json> // Renders formatted forecast to stdout (table of stages, total, confidence) // cost_record_variance <forecast_usd> <actual_usd> <template> <issue> // Emits cost.forecast_variance event to events.jsonl // --force-start flag on pipeline start // Bypasses budget gate when forecast exceeds remaining budget
-
pipeline_start()→ loads template config → callscost_forecast "$PIPELINE_CONFIG" "$INTELLIGENCE_COMPLEXITY" -
cost_forecast()→ reads template stages → looks up historical durations fromevents.jsonl→ computes per-stage cost usingcost_calculate()→ returns JSON -
cost_forecast_display()→ renders table to CLI - Budget gate: compares
forecast.total_usdagainstcost_remaining_budget()→ blocks or warns - On pipeline completion:
cost_record_variance "$FORECAST_USD" "$total_cost"→ emitscost.forecast_varianceevent
-
cost_forecast()failures are non-fatal — pipeline proceeds with a warning - Budget gate respects
--force-start— never hard-blocks when override is present - Dashboard forecast display degrades gracefully when no forecast data exists
-
scripts/sw-cost.sh— Addcost_forecast(),cost_forecast_display(),cost_record_variance(), CLI subcommandforecast -
scripts/sw-pipeline.sh— Add--force-startflag parsing, call forecast beforerun_pipeline, record variance at completion -
scripts/sw-cost-test.sh— Add tests for forecast functions -
dashboard/src/views/metrics.ts— AddrenderCostForecast()section -
dashboard/src/core/api.ts— Add/api/costs/forecastendpoint handler
All logic fits naturally into existing files following the project's established patterns.
Add hardcoded default durations (seconds) per stage for cold-start scenarios:
# Default stage durations (seconds) — used when no historical data _DEFAULT_STAGE_DURATIONS='{"intake":60,"plan":300,"design":300,"build":1200,"test":180,"review":300,"compound_quality":600,"audit":120,"pr":60,"merge":60,"deploy":120,"validate":60,"monitor":300}'
# cost_forecast <template_config_path> [complexity] # Returns JSON with per-stage estimates, total, and confidence level cost_forecast() { local template_config="1ドル" local complexity="${2:-5}" # 1. Read enabled stages + their model from template # 2. For each stage, look up historical avg duration from events.jsonl # (grep pipeline.completed events, extract stage timings by template) # 3. Fall back to _DEFAULT_STAGE_DURATIONS if no history # 4. Estimate tokens from duration (heuristic: ~50 input + ~20 output tokens/sec for active stages) # 5. Calculate cost per stage using cost_calculate() # 6. Determine confidence from data point count # 7. Apply complexity multiplier (complexity/5 — normalized around 1.0) # 8. Output JSON }
Historical lookup: query events.jsonl for stage.completed events matching the template, compute average duration_s per stage. Count data points for confidence.
Token estimation heuristic: Based on typical Claude Code usage patterns:
- Active stages (build): ~50 input + ~20 output tokens/second
- Light stages (intake, pr, merge): ~20 input + ~10 output tokens/second
- Review stages (review, compound_quality): ~40 input + ~30 output tokens/second
Render a formatted table:
╔══════════════════════════════════════════════════════╗
║ Pipeline Cost Forecast ║
╠══════════════════════════════════════════════════════╣
║ Stage Model Duration Est. Cost ║
║ ───────────────── ──────── ────────── ───────── ║
║ intake haiku 1m 0s 0ドル.02 ║
║ plan sonnet 5m 0s 0ドル.45 ║
║ build sonnet 20m 0s 3ドル.60 ║
║ test haiku 3m 0s 0ドル.04 ║
║ review opus 5m 0s 2ドル.25 ║
║ ───────────────── ──────── ────────── ───────── ║
║ TOTAL 34m 0s 6ドル.36 ║
║ Confidence: medium (12 historical runs) ║
║ Budget remaining: 43ドル.64 / 50ドル.00 ║
╚══════════════════════════════════════════════════════╝
# cost_record_variance <forecast_usd> <actual_usd> <template> <issue> cost_record_variance() { local forecast="1ドル" actual="2ドル" template="3ドル" issue="${4:-}" local variance pct_variance variance=$(awk -v f="$forecast" -v a="$actual" 'BEGIN{printf "%.4f", a - f}') pct_variance=$(awk -v f="$forecast" -v a="$actual" 'BEGIN{if(f>0) printf "%.1f", ((a-f)/f)*100; else print "0"}') emit_event "cost.forecast_variance" \ "forecast_usd=$forecast" \ "actual_usd=$actual" \ "variance_usd=$variance" \ "variance_pct=$pct_variance" \ "template=$template" \ "issue=$issue" }
Add to the case statement: forecast) cost_forecast_cli "$@" ;;
This calls cost_forecast and pipes through cost_forecast_display for CLI usage: shipwright cost forecast --pipeline standard
Add --force-start to the argument parser (alongside existing --pipeline, --goal, etc.):
FORCE_START=false # In the getopts/while loop: --force-start) FORCE_START=true ;;
After load_pipeline_config and before run_pipeline, insert:
# Cost forecast and budget gate if [[ -f "$SCRIPT_DIR/sw-cost.sh" ]]; then source "$SCRIPT_DIR/sw-cost.sh" FORECAST_JSON=$(cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}" 2>/dev/null || echo "") if [[ -n "$FORECAST_JSON" ]]; then FORECAST_USD=$(echo "$FORECAST_JSON" | jq -r '.total_usd // 0') cost_forecast_display "$FORECAST_JSON" # Budget gate check local remaining remaining=$(cost_remaining_budget 2>/dev/null || echo "unlimited") if [[ "$remaining" != "unlimited" ]]; then if awk -v f="$FORECAST_USD" -v r="$remaining" 'BEGIN{exit !(f > r)}'; then if [[ "$FORCE_START" == "true" ]]; then warn "Forecast \$${FORECAST_USD} exceeds remaining budget \$${remaining} — proceeding (--force-start)" else error "Forecast \$${FORECAST_USD} exceeds remaining budget \$${remaining}" echo -e " Override: ${DIM}shipwright pipeline start ... --force-start${RESET}" exit 1 fi fi fi emit_event "cost.forecast" \ "forecast_usd=$FORECAST_USD" \ "template=$PIPELINE_NAME" \ "confidence=$(echo "$FORECAST_JSON" | jq -r '.confidence')" \ "issue=${ISSUE_NUMBER:-0}" fi fi
After the pipeline.completed event emission (around line 2700), add:
if [[ -n "${FORECAST_USD:-}" && "${FORECAST_USD}" != "0" ]]; then cost_record_variance "$FORECAST_USD" "$total_cost" "$PIPELINE_NAME" "${ISSUE_NUMBER:-}" fi
In dashboard/src/core/api.ts, add handler for /api/costs/forecast:
- Read recent
cost.forecastandcost.forecast_varianceevents from events.jsonl - Return forecast data for display
In dashboard/src/views/metrics.ts, add renderCostForecast():
- Show last forecast with confidence indicator
- Show forecast vs actual variance trend (bar chart)
- Color-code: green (within 20%), yellow (20-50% off), red (>50% off)
Test cases:
-
cost_forecastwith a template file returns valid JSON with expected fields -
cost_forecastwith no historical data uses defaults and returns confidence "low" -
cost_forecast_displayproduces formatted output -
cost_record_varianceemits correct event - Budget gate blocks when forecast > remaining
- Budget gate allows with
--force-start - Variance calculation correctness (positive/negative)
- Task 1: Add default stage duration constants and token-rate heuristics to
sw-cost.sh - Task 2: Implement
cost_forecast()function with historical lookup and cold-start defaults - Task 3: Implement
cost_forecast_display()formatted CLI output - Task 4: Implement
cost_record_variance()with event emission - Task 5: Add
forecastCLI subcommand tosw-cost.shcase statement - Task 6: Add
--force-startflag parsing tosw-pipeline.sh - Task 7: Integrate forecast display and budget gate into
pipeline_start() - Task 8: Record forecast variance at pipeline completion in
sw-pipeline.sh - Task 9: Add dashboard API endpoint for forecast data
- Task 10: Add dashboard forecast display component
- Task 11: Write tests for all forecast functions in
sw-cost-test.sh - Task 12: Run full test suite and fix any failures
Following existing test patterns in the repo (mock binaries, PASS/FAIL counters):
-
cost_forecast basic: Create a minimal template JSON, call
cost_forecast, verify JSON output hastotal_usd,stages,confidence,data_points - cost_forecast cold-start: No events.jsonl → uses defaults, confidence = "low"
- cost_forecast with history: Seed events.jsonl with stage.completed events, verify durations are averaged
- cost_forecast_display: Verify output contains stage names, costs, total line
- cost_record_variance: Verify event written to events.jsonl with correct fields
- Variance math: forecast=5ドル.00 actual=6ドル.50 → variance=1ドル.50, pct=30.0%
- Budget gate integration: Mock budget at 10,ドル forecast at 15,ドル verify exit code 1; with FORCE_START=true, verify exit code 0
-
npm test— run existing suite to verify no regressions - Manual:
shipwright cost forecast --pipeline cost-awareto verify CLI output
-
cost_forecast()returns JSON withtotal_usd,stages[],confidence,data_pointsfor any template - Forecast is displayed before pipeline starts in CLI with formatted table
- Pipeline start is blocked when forecast exceeds remaining budget (exit 1)
-
--force-startflag overrides the budget gate with a warning -
cost.forecastevent emitted to events.jsonl before pipeline runs -
cost.forecast_varianceevent emitted after pipeline completes with forecast vs actual - Dashboard shows cost forecast when pipeline is queued/starting
- Confidence interval shown as low/medium/high based on historical data count
- All new functions have tests in sw-cost-test.sh
-
npm testpasses with no regressions - Bash 3.2 compatible (no associative arrays, no readarray)
-
Input:
--pipeline <name>(template name, default: standard),--complexity <1-10>(default: 5) -
Output: Formatted forecast table to stdout;
--jsonflag for raw JSON - Exit codes: 0 (success), 1 (error loading template)
// cost.forecast (emitted before pipeline start) {"ts":"...","type":"cost.forecast","forecast_usd":"6.36","template":"standard","confidence":"medium","issue":"178"} // cost.forecast_variance (emitted after pipeline completion) {"ts":"...","type":"cost.forecast_variance","forecast_usd":"6.36","actual_usd":"7.12","variance_usd":"0.76","variance_pct":"11.9","template":"standard","issue":"178"}
-
Response:
{ recent_forecasts: [{issue, template, forecast_usd, confidence, ts}], variance_history: [{forecast, actual, variance_pct, template, ts}] } -
Error:
{ error: { code: "NO_DATA", message: "No forecast data available" } }
Rate Limiting: Not applicable (local dashboard, single-user). Versioning: Not applicable (internal API, no external consumers).
Primary: As a pipeline operator, I want to see estimated pipeline cost before it starts, so that I can make informed go/no-go decisions and avoid budget surprises.
Secondary: As a team lead reviewing daemon activity, I want to see forecast accuracy over time in the dashboard, so that I can trust the estimates and tune budgets appropriately.
-
Given a budget is set and a pipeline template is selected, when I run
shipwright pipeline start, then I see a cost forecast table before execution begins -
Given the forecast exceeds remaining budget, when I run pipeline start without
--force-start, then the pipeline is blocked with an error message showing the override command -
Given the forecast exceeds remaining budget, when I run with
--force-start, then the pipeline proceeds with a warning -
Given a pipeline completes, when I check events.jsonl, then I see a
cost.forecast_varianceevent with forecast, actual, and percentage difference
- No historical data (cold start): Uses hardcoded defaults, shows confidence as "low" — user knows estimate is rough
- Budget not configured: Forecast displays but gate is skipped — shows "Budget: unlimited"
- Forecast is 0ドル (all stages disabled): Shows "No billable stages" message, skips gate
MetricsView
├── renderMetrics() (existing)
│ ├── Success Rate Donut
│ ├── Duration / Throughput
│ └── Cost Breakdown
└── renderCostForecast() (new)
├── ForecastCard (last forecast summary)
│ ├── Total estimate
│ ├── Confidence badge (low/medium/high)
│ └── Budget status bar
└── VarianceChart (forecast vs actual history)
└── Bar chart with color-coded variance
State Management: Forecast data fetched via api.fetchForecastData() → stored in store → rendered by renderCostForecast(). Same pattern as existing cost views.
Accessibility: Semantic HTML (table for forecast data), color + text for confidence indicators, keyboard-accessible chart tooltips.
Responsive Breakpoints: Single-column layout at 320px-768px, two-column at 1024px+ (forecast card beside variance chart). Follows existing metrics grid pattern.
- Forecast function doesn't crash or block pipeline start (non-fatal wrapper)
- Budget gate correctly blocks/allows based on comparison
-
--force-startoverride works
- Forecast accuracy: track variance_pct in events, alert if consistently >50%
- No regressions in existing cost tracking
- Forecast model accuracy improves as more historical data accumulates
- Dashboard variance chart shows trend toward tighter estimates
- Forecast variance consistently >100% for 5+ pipelines → heuristic needs recalibration
- cost.forecast events stop appearing → integration broken
- Check for
cost_forecasterrors in pipeline output - Monitor for jq parse failures on events.jsonl queries
Not applicable — this is a pre-start gate feature, not a deployment. If the forecast function fails, the pipeline proceeds normally (fail-open design).
| Approach | Pros | Cons |
|---|---|---|
| A: Inline in sw-pipeline.sh | Single file change | Mixes concerns, hard to test, bloats already large file (3000+ lines) |
| B: Functions in sw-cost.sh (chosen) | Clean separation, testable, follows existing patterns | Requires sourcing sw-cost.sh in pipeline (already done for some paths) |
| C: Separate sw-forecast.sh script | Maximum isolation | Adds a new file unnecessarily, more complex wiring |
| Risk | Impact | Mitigation |
|---|---|---|
| Forecast crashes pipeline start | High — blocks all pipelines | Wrap in ` |
| Inaccurate estimates frustrate users | Medium — trust erosion | Show confidence level, improve with data |
| Budget gate too aggressive | Medium — blocks legitimate work |
--force-start override, configurable gate |
| Historical data query slow on large events.jsonl | Low — one-time read at start | Use tail -1000 to limit scan, cache result |
| Breaking Bash 3.2 compat | Medium — fails on macOS | No associative arrays, test on macOS |