Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 178

ezigus edited this page Mar 16, 2026 · 4 revisions

Implementation Plan: Pipeline Cost Forecast and Budget Gate

Socratic Design Refinement

Requirements Clarity

Minimum viable change: Add a cost_forecast() function to sw-cost.sh that estimates pipeline cost from template stages ×ばつ model pricing ×ばつ historical duration, display the forecast before pipeline_start() proceeds, and gate on budget. Emit variance events at completion.

Implicit requirements: Need sensible defaults when no historical data exists (cold-start). Confidence intervals require enough historical data points to be meaningful. The --force-start override must bypass the gate cleanly.

Acceptance criteria (from issue): forecast display, budget gate with override, variance event, dashboard display, confidence intervals.

Design Alternatives

Approach A: Inline in sw-pipeline.sh — Add forecast logic directly into pipeline_start(). Simple but mixes concerns and makes testing harder.

Approach B: New functions in sw-cost.sh + integration point in sw-pipeline.sh — Add cost_forecast() and cost_forecast_display() to sw-cost.sh (where all cost logic lives), call from pipeline_start() after template is loaded. Clean separation, testable, follows existing patterns.

Chosen: Approach B — Minimal blast radius, follows existing code organization (cost logic in sw-cost.sh, pipeline orchestration in sw-pipeline.sh). The dashboard integration adds a new API endpoint and view component.

Risk Assessment

  • Cold-start with no history: Mitigated by hardcoded default durations per stage (based on typical pipeline runs)
  • Inaccurate forecasts blocking legitimate work: Mitigated by --force-start override and configurable gate
  • Breaking existing budget checks: Low risk — new functions, existing cost_check_budget() unchanged
  • Dashboard changes: Isolated to new view components, no existing views modified

Simplicity Check

  • 3 files modified (sw-cost.sh, sw-pipeline.sh, dashboard metrics.ts)
  • 1 file modified for tests (sw-cost-test.sh)
  • Reuses existing cost_calculate(), cost_check_budget(), cost_remaining_budget(), model pricing, and event emission infrastructure
  • No new dependencies

Architecture Decision Record

Component Diagram

┌─────────────────────┐ ┌──────────────────────┐
│ sw-pipeline.sh │ │ sw-cost.sh │
│ │ │ │
│ pipeline_start() │────▶│ cost_forecast() │
│ ├─ load config │ │ ├─ count stages │
│ ├─ FORECAST ◀────│─────│ ├─ get history │
│ ├─ budget gate │ │ ├─ calc per-stage │
│ └─ run_pipeline │ │ └─ return estimate │
│ │ │ │
│ pipeline complete │────▶│ cost_record_variance │
│ │ │ └─ emit event │
└─────────────────────┘ └──────────────────────┘
 │ │
 ▼ ▼
┌─────────────────────┐ ┌──────────────────────┐
│ events.jsonl │ │ dashboard/metrics.ts │
│ cost.forecast │ │ renderCostForecast() │
│ cost.forecast_var │ │ forecast card + bar │
└─────────────────────┘ └──────────────────────┘

Interface Contracts

// cost_forecast <template_config_path> [complexity]
// Returns JSON: { total_usd, stages: [{id, model, est_duration_s, est_cost}], confidence, data_points }
// confidence: "low" (<5 data points), "medium" (5-20), "high" (>20)
// cost_forecast_display <forecast_json>
// Renders formatted forecast to stdout (table of stages, total, confidence)
// cost_record_variance <forecast_usd> <actual_usd> <template> <issue>
// Emits cost.forecast_variance event to events.jsonl
// --force-start flag on pipeline start
// Bypasses budget gate when forecast exceeds remaining budget

Data Flow

  1. pipeline_start() → loads template config → calls cost_forecast "$PIPELINE_CONFIG" "$INTELLIGENCE_COMPLEXITY"
  2. cost_forecast() → reads template stages → looks up historical durations from events.jsonl → computes per-stage cost using cost_calculate() → returns JSON
  3. cost_forecast_display() → renders table to CLI
  4. Budget gate: compares forecast.total_usd against cost_remaining_budget() → blocks or warns
  5. On pipeline completion: cost_record_variance "$FORECAST_USD" "$total_cost" → emits cost.forecast_variance event

Error Boundaries

  • cost_forecast() failures are non-fatal — pipeline proceeds with a warning
  • Budget gate respects --force-start — never hard-blocks when override is present
  • Dashboard forecast display degrades gracefully when no forecast data exists

Files to Modify

Modified Files

  1. scripts/sw-cost.sh — Add cost_forecast(), cost_forecast_display(), cost_record_variance(), CLI subcommand forecast
  2. scripts/sw-pipeline.sh — Add --force-start flag parsing, call forecast before run_pipeline, record variance at completion
  3. scripts/sw-cost-test.sh — Add tests for forecast functions
  4. dashboard/src/views/metrics.ts — Add renderCostForecast() section
  5. dashboard/src/core/api.ts — Add /api/costs/forecast endpoint handler

No New Files Created

All logic fits naturally into existing files following the project's established patterns.


Implementation Steps

Step 1: Add default stage duration constants to sw-cost.sh

Add hardcoded default durations (seconds) per stage for cold-start scenarios:

# Default stage durations (seconds) — used when no historical data
_DEFAULT_STAGE_DURATIONS='{"intake":60,"plan":300,"design":300,"build":1200,"test":180,"review":300,"compound_quality":600,"audit":120,"pr":60,"merge":60,"deploy":120,"validate":60,"monitor":300}'

Step 2: Add cost_forecast() function to sw-cost.sh

# cost_forecast <template_config_path> [complexity]
# Returns JSON with per-stage estimates, total, and confidence level
cost_forecast() {
 local template_config="1ドル"
 local complexity="${2:-5}"
 # 1. Read enabled stages + their model from template
 # 2. For each stage, look up historical avg duration from events.jsonl
 # (grep pipeline.completed events, extract stage timings by template)
 # 3. Fall back to _DEFAULT_STAGE_DURATIONS if no history
 # 4. Estimate tokens from duration (heuristic: ~50 input + ~20 output tokens/sec for active stages)
 # 5. Calculate cost per stage using cost_calculate()
 # 6. Determine confidence from data point count
 # 7. Apply complexity multiplier (complexity/5 — normalized around 1.0)
 # 8. Output JSON
}

Historical lookup: query events.jsonl for stage.completed events matching the template, compute average duration_s per stage. Count data points for confidence.

Token estimation heuristic: Based on typical Claude Code usage patterns:

  • Active stages (build): ~50 input + ~20 output tokens/second
  • Light stages (intake, pr, merge): ~20 input + ~10 output tokens/second
  • Review stages (review, compound_quality): ~40 input + ~30 output tokens/second

Step 3: Add cost_forecast_display() to sw-cost.sh

Render a formatted table:

╔══════════════════════════════════════════════════════╗
║ Pipeline Cost Forecast ║
╠══════════════════════════════════════════════════════╣
║ Stage Model Duration Est. Cost ║
║ ───────────────── ──────── ────────── ───────── ║
║ intake haiku 1m 0s 0ドル.02 ║
║ plan sonnet 5m 0s 0ドル.45 ║
║ build sonnet 20m 0s 3ドル.60 ║
║ test haiku 3m 0s 0ドル.04 ║
║ review opus 5m 0s 2ドル.25 ║
║ ───────────────── ──────── ────────── ───────── ║
║ TOTAL 34m 0s 6ドル.36 ║
║ Confidence: medium (12 historical runs) ║
║ Budget remaining: 43ドル.64 / 50ドル.00 ║
╚══════════════════════════════════════════════════════╝

Step 4: Add cost_record_variance() to sw-cost.sh

# cost_record_variance <forecast_usd> <actual_usd> <template> <issue>
cost_record_variance() {
 local forecast="1ドル" actual="2ドル" template="3ドル" issue="${4:-}"
 local variance pct_variance
 variance=$(awk -v f="$forecast" -v a="$actual" 'BEGIN{printf "%.4f", a - f}')
 pct_variance=$(awk -v f="$forecast" -v a="$actual" 'BEGIN{if(f>0) printf "%.1f", ((a-f)/f)*100; else print "0"}')
 emit_event "cost.forecast_variance" \
 "forecast_usd=$forecast" \
 "actual_usd=$actual" \
 "variance_usd=$variance" \
 "variance_pct=$pct_variance" \
 "template=$template" \
 "issue=$issue"
}

Step 5: Add forecast CLI subcommand to sw-cost.sh

Add to the case statement: forecast) cost_forecast_cli "$@" ;; This calls cost_forecast and pipes through cost_forecast_display for CLI usage: shipwright cost forecast --pipeline standard

Step 6: Integrate forecast into sw-pipeline.sh — flag parsing

Add --force-start to the argument parser (alongside existing --pipeline, --goal, etc.):

FORCE_START=false
# In the getopts/while loop:
--force-start) FORCE_START=true ;;

Step 7: Integrate forecast into sw-pipeline.sh — pre-start gate

After load_pipeline_config and before run_pipeline, insert:

# Cost forecast and budget gate
if [[ -f "$SCRIPT_DIR/sw-cost.sh" ]]; then
 source "$SCRIPT_DIR/sw-cost.sh"
 FORECAST_JSON=$(cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}" 2>/dev/null || echo "")
 if [[ -n "$FORECAST_JSON" ]]; then
 FORECAST_USD=$(echo "$FORECAST_JSON" | jq -r '.total_usd // 0')
 cost_forecast_display "$FORECAST_JSON"
 # Budget gate check
 local remaining
 remaining=$(cost_remaining_budget 2>/dev/null || echo "unlimited")
 if [[ "$remaining" != "unlimited" ]]; then
 if awk -v f="$FORECAST_USD" -v r="$remaining" 'BEGIN{exit !(f > r)}'; then
 if [[ "$FORCE_START" == "true" ]]; then
 warn "Forecast \$${FORECAST_USD} exceeds remaining budget \$${remaining} — proceeding (--force-start)"
 else
 error "Forecast \$${FORECAST_USD} exceeds remaining budget \$${remaining}"
 echo -e " Override: ${DIM}shipwright pipeline start ... --force-start${RESET}"
 exit 1
 fi
 fi
 fi
 emit_event "cost.forecast" \
 "forecast_usd=$FORECAST_USD" \
 "template=$PIPELINE_NAME" \
 "confidence=$(echo "$FORECAST_JSON" | jq -r '.confidence')" \
 "issue=${ISSUE_NUMBER:-0}"
 fi
fi

Step 8: Record variance at pipeline completion

After the pipeline.completed event emission (around line 2700), add:

if [[ -n "${FORECAST_USD:-}" && "${FORECAST_USD}" != "0" ]]; then
 cost_record_variance "$FORECAST_USD" "$total_cost" "$PIPELINE_NAME" "${ISSUE_NUMBER:-}"
fi

Step 9: Add dashboard API endpoint

In dashboard/src/core/api.ts, add handler for /api/costs/forecast:

  • Read recent cost.forecast and cost.forecast_variance events from events.jsonl
  • Return forecast data for display

Step 10: Add dashboard forecast display

In dashboard/src/views/metrics.ts, add renderCostForecast():

  • Show last forecast with confidence indicator
  • Show forecast vs actual variance trend (bar chart)
  • Color-code: green (within 20%), yellow (20-50% off), red (>50% off)

Step 11: Add tests to sw-cost-test.sh

Test cases:

  • cost_forecast with a template file returns valid JSON with expected fields
  • cost_forecast with no historical data uses defaults and returns confidence "low"
  • cost_forecast_display produces formatted output
  • cost_record_variance emits correct event
  • Budget gate blocks when forecast > remaining
  • Budget gate allows with --force-start
  • Variance calculation correctness (positive/negative)

Task Checklist

  • Task 1: Add default stage duration constants and token-rate heuristics to sw-cost.sh
  • Task 2: Implement cost_forecast() function with historical lookup and cold-start defaults
  • Task 3: Implement cost_forecast_display() formatted CLI output
  • Task 4: Implement cost_record_variance() with event emission
  • Task 5: Add forecast CLI subcommand to sw-cost.sh case statement
  • Task 6: Add --force-start flag parsing to sw-pipeline.sh
  • Task 7: Integrate forecast display and budget gate into pipeline_start()
  • Task 8: Record forecast variance at pipeline completion in sw-pipeline.sh
  • Task 9: Add dashboard API endpoint for forecast data
  • Task 10: Add dashboard forecast display component
  • Task 11: Write tests for all forecast functions in sw-cost-test.sh
  • Task 12: Run full test suite and fix any failures

Testing Approach

Unit Tests (sw-cost-test.sh)

Following existing test patterns in the repo (mock binaries, PASS/FAIL counters):

  1. cost_forecast basic: Create a minimal template JSON, call cost_forecast, verify JSON output has total_usd, stages, confidence, data_points
  2. cost_forecast cold-start: No events.jsonl → uses defaults, confidence = "low"
  3. cost_forecast with history: Seed events.jsonl with stage.completed events, verify durations are averaged
  4. cost_forecast_display: Verify output contains stage names, costs, total line
  5. cost_record_variance: Verify event written to events.jsonl with correct fields
  6. Variance math: forecast=5ドル.00 actual=6ドル.50 → variance=1ドル.50, pct=30.0%
  7. Budget gate integration: Mock budget at 10,ドル forecast at 15,ドル verify exit code 1; with FORCE_START=true, verify exit code 0

Integration Tests

  • npm test — run existing suite to verify no regressions
  • Manual: shipwright cost forecast --pipeline cost-aware to verify CLI output

Definition of Done

  • cost_forecast() returns JSON with total_usd, stages[], confidence, data_points for any template
  • Forecast is displayed before pipeline starts in CLI with formatted table
  • Pipeline start is blocked when forecast exceeds remaining budget (exit 1)
  • --force-start flag overrides the budget gate with a warning
  • cost.forecast event emitted to events.jsonl before pipeline runs
  • cost.forecast_variance event emitted after pipeline completes with forecast vs actual
  • Dashboard shows cost forecast when pipeline is queued/starting
  • Confidence interval shown as low/medium/high based on historical data count
  • All new functions have tests in sw-cost-test.sh
  • npm test passes with no regressions
  • Bash 3.2 compatible (no associative arrays, no readarray)

Endpoint Specification

CLI Endpoint: shipwright cost forecast

  • Input: --pipeline <name> (template name, default: standard), --complexity <1-10> (default: 5)
  • Output: Formatted forecast table to stdout; --json flag for raw JSON
  • Exit codes: 0 (success), 1 (error loading template)

Event Schemas

// cost.forecast (emitted before pipeline start)
{"ts":"...","type":"cost.forecast","forecast_usd":"6.36","template":"standard","confidence":"medium","issue":"178"}
// cost.forecast_variance (emitted after pipeline completion)
{"ts":"...","type":"cost.forecast_variance","forecast_usd":"6.36","actual_usd":"7.12","variance_usd":"0.76","variance_pct":"11.9","template":"standard","issue":"178"}

Dashboard API: GET /api/costs/forecast

  • Response: { recent_forecasts: [{issue, template, forecast_usd, confidence, ts}], variance_history: [{forecast, actual, variance_pct, template, ts}] }
  • Error: { error: { code: "NO_DATA", message: "No forecast data available" } }

Rate Limiting: Not applicable (local dashboard, single-user). Versioning: Not applicable (internal API, no external consumers).


User Stories

Primary: As a pipeline operator, I want to see estimated pipeline cost before it starts, so that I can make informed go/no-go decisions and avoid budget surprises.

Secondary: As a team lead reviewing daemon activity, I want to see forecast accuracy over time in the dashboard, so that I can trust the estimates and tune budgets appropriately.

Acceptance Criteria (Given/When/Then)

  • Given a budget is set and a pipeline template is selected, when I run shipwright pipeline start, then I see a cost forecast table before execution begins
  • Given the forecast exceeds remaining budget, when I run pipeline start without --force-start, then the pipeline is blocked with an error message showing the override command
  • Given the forecast exceeds remaining budget, when I run with --force-start, then the pipeline proceeds with a warning
  • Given a pipeline completes, when I check events.jsonl, then I see a cost.forecast_variance event with forecast, actual, and percentage difference

Edge Cases from User Perspective

  1. No historical data (cold start): Uses hardcoded defaults, shows confidence as "low" — user knows estimate is rough
  2. Budget not configured: Forecast displays but gate is skipped — shows "Budget: unlimited"
  3. Forecast is 0ドル (all stages disabled): Shows "No billable stages" message, skips gate

Component Hierarchy (Dashboard)

MetricsView
├── renderMetrics() (existing)
│ ├── Success Rate Donut
│ ├── Duration / Throughput
│ └── Cost Breakdown
└── renderCostForecast() (new)
 ├── ForecastCard (last forecast summary)
 │ ├── Total estimate
 │ ├── Confidence badge (low/medium/high)
 │ └── Budget status bar
 └── VarianceChart (forecast vs actual history)
 └── Bar chart with color-coded variance

State Management: Forecast data fetched via api.fetchForecastData() → stored in store → rendered by renderCostForecast(). Same pattern as existing cost views.

Accessibility: Semantic HTML (table for forecast data), color + text for confidence indicators, keyboard-accessible chart tooltips.

Responsive Breakpoints: Single-column layout at 320px-768px, two-column at 1024px+ (forecast card beside variance chart). Follows existing metrics grid pattern.


Monitoring Checklist

P0 — Immediate

  • Forecast function doesn't crash or block pipeline start (non-fatal wrapper)
  • Budget gate correctly blocks/allows based on comparison
  • --force-start override works

P1 — Short-term

  • Forecast accuracy: track variance_pct in events, alert if consistently >50%
  • No regressions in existing cost tracking

P2 — Medium-term

  • Forecast model accuracy improves as more historical data accumulates
  • Dashboard variance chart shows trend toward tighter estimates

Anomaly Detection Triggers

  • Forecast variance consistently >100% for 5+ pipelines → heuristic needs recalibration
  • cost.forecast events stop appearing → integration broken

Log Analysis

  • Check for cost_forecast errors in pipeline output
  • Monitor for jq parse failures on events.jsonl queries

Auto-Rollback Decision Criteria

Not applicable — this is a pre-start gate feature, not a deployment. If the forecast function fails, the pipeline proceeds normally (fail-open design).


Alternatives Considered

Approach Pros Cons
A: Inline in sw-pipeline.sh Single file change Mixes concerns, hard to test, bloats already large file (3000+ lines)
B: Functions in sw-cost.sh (chosen) Clean separation, testable, follows existing patterns Requires sourcing sw-cost.sh in pipeline (already done for some paths)
C: Separate sw-forecast.sh script Maximum isolation Adds a new file unnecessarily, more complex wiring

Risk Analysis

Risk Impact Mitigation
Forecast crashes pipeline start High — blocks all pipelines Wrap in `
Inaccurate estimates frustrate users Medium — trust erosion Show confidence level, improve with data
Budget gate too aggressive Medium — blocks legitimate work --force-start override, configurable gate
Historical data query slow on large events.jsonl Low — one-time read at start Use tail -1000 to limit scan, cache result
Breaking Bash 3.2 compat Medium — fails on macOS No associative arrays, test on macOS

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /