Pipeline Plan 178

ezigus edited this page Mar 16, 2026 · 4 revisions

Implementation Plan: Pipeline Cost Forecast and Budget Gate

Socratic Design Refinement

Requirements Clarity

Minimum viable change: Add a cost_forecast() function to sw-cost.sh that estimates pipeline cost from template stages ×ばつ model pricing ×ばつ historical duration, display the forecast before pipeline_start() proceeds, and gate on budget. Emit variance events at completion.

Implicit requirements: Need sensible defaults when no historical data exists (cold-start). Confidence intervals require enough historical data points to be meaningful. The --force-start override must bypass the gate cleanly.

Acceptance criteria (from issue): forecast display, budget gate with override, variance event, dashboard display, confidence intervals.

Design Alternatives

Approach A: Inline in sw-pipeline.sh — Add forecast logic directly into pipeline_start(). Simple but mixes concerns and makes testing harder.

Approach B: New functions in sw-cost.sh + integration point in sw-pipeline.sh — Add cost_forecast() and cost_forecast_display() to sw-cost.sh (where all cost logic lives), call from pipeline_start() after template is loaded. Clean separation, testable, follows existing patterns.

Chosen: Approach B — Minimal blast radius, follows existing code organization (cost logic in sw-cost.sh, pipeline orchestration in sw-pipeline.sh). The dashboard integration adds a new API endpoint and view component.

Risk Assessment

Cold-start with no history: Mitigated by hardcoded default durations per stage (based on typical pipeline runs)
Inaccurate forecasts blocking legitimate work: Mitigated by --force-start override and configurable gate
Breaking existing budget checks: Low risk — new functions, existing cost_check_budget() unchanged
Dashboard changes: Isolated to new view components, no existing views modified

Simplicity Check

3 files modified (sw-cost.sh, sw-pipeline.sh, dashboard metrics.ts)
1 file modified for tests (sw-cost-test.sh)
Reuses existing cost_calculate(), cost_check_budget(), cost_remaining_budget(), model pricing, and event emission infrastructure
No new dependencies

Architecture Decision Record

Component Diagram

┌─────────────────────┐ ┌──────────────────────┐
│ sw-pipeline.sh │ │ sw-cost.sh │
│ │ │ │
│ pipeline_start() │────▶│ cost_forecast() │
│ ├─ load config │ │ ├─ count stages │
│ ├─ FORECAST ◀────│─────│ ├─ get history │
│ ├─ budget gate │ │ ├─ calc per-stage │
│ └─ run_pipeline │ │ └─ return estimate │
│ │ │ │
│ pipeline complete │────▶│ cost_record_variance │
│ │ │ └─ emit event │
└─────────────────────┘ └──────────────────────┘
 │ │
 ▼ ▼
┌─────────────────────┐ ┌──────────────────────┐
│ events.jsonl │ │ dashboard/metrics.ts │
│ cost.forecast │ │ renderCostForecast() │
│ cost.forecast_var │ │ forecast card + bar │
└─────────────────────┘ └──────────────────────┘

Interface Contracts

// cost_forecast <template_config_path> [complexity]
// Returns JSON: { total_usd, stages: [{id, model, est_duration_s, est_cost}], confidence, data_points }
// confidence: "low" (<5 data points), "medium" (5-20), "high" (>20)
// cost_forecast_display <forecast_json>
// Renders formatted forecast to stdout (table of stages, total, confidence)
// cost_record_variance <forecast_usd> <actual_usd> <template> <issue>
// Emits cost.forecast_variance event to events.jsonl
// --force-start flag on pipeline start
// Bypasses budget gate when forecast exceeds remaining budget

Data Flow

pipeline_start() → loads template config → calls cost_forecast "$PIPELINE_CONFIG" "$INTELLIGENCE_COMPLEXITY"
cost_forecast() → reads template stages → looks up historical durations from events.jsonl → computes per-stage cost using cost_calculate() → returns JSON
cost_forecast_display() → renders table to CLI
Budget gate: compares forecast.total_usd against cost_remaining_budget() → blocks or warns
On pipeline completion: cost_record_variance "$FORECAST_USD" "$total_cost" → emits cost.forecast_variance event

Error Boundaries

cost_forecast() failures are non-fatal — pipeline proceeds with a warning
Budget gate respects --force-start — never hard-blocks when override is present
Dashboard forecast display degrades gracefully when no forecast data exists

Files to Modify

Modified Files

scripts/sw-cost.sh — Add cost_forecast(), cost_forecast_display(), cost_record_variance(), CLI subcommand forecast
scripts/sw-pipeline.sh — Add --force-start flag parsing, call forecast before run_pipeline, record variance at completion
scripts/sw-cost-test.sh — Add tests for forecast functions
dashboard/src/views/metrics.ts — Add renderCostForecast() section
dashboard/src/core/api.ts — Add /api/costs/forecast endpoint handler

No New Files Created

All logic fits naturally into existing files following the project's established patterns.

Implementation Steps

Step 1: Add default stage duration constants to sw-cost.sh

Add hardcoded default durations (seconds) per stage for cold-start scenarios:

# Default stage durations (seconds) — used when no historical data
_DEFAULT_STAGE_DURATIONS='{"intake":60,"plan":300,"design":300,"build":1200,"test":180,"review":300,"compound_quality":600,"audit":120,"pr":60,"merge":60,"deploy":120,"validate":60,"monitor":300}'

Step 2: Add `cost_forecast()` function to sw-cost.sh

# cost_forecast <template_config_path> [complexity]
# Returns JSON with per-stage estimates, total, and confidence level
cost_forecast() {
 local template_config="1ドル"
 local complexity="${2:-5}"
 # 1. Read enabled stages + their model from template
 # 2. For each stage, look up historical avg duration from events.jsonl
 # (grep pipeline.completed events, extract stage timings by template)
 # 3. Fall back to _DEFAULT_STAGE_DURATIONS if no history
 # 4. Estimate tokens from duration (heuristic: ~50 input + ~20 output tokens/sec for active stages)
 # 5. Calculate cost per stage using cost_calculate()
 # 6. Determine confidence from data point count
 # 7. Apply complexity multiplier (complexity/5 — normalized around 1.0)
 # 8. Output JSON
}

Historical lookup: query events.jsonl for stage.completed events matching the template, compute average duration_s per stage. Count data points for confidence.

Token estimation heuristic: Based on typical Claude Code usage patterns:

Active stages (build): ~50 input + ~20 output tokens/second
Light stages (intake, pr, merge): ~20 input + ~10 output tokens/second
Review stages (review, compound_quality): ~40 input + ~30 output tokens/second

Step 3: Add `cost_forecast_display()` to sw-cost.sh

Render a formatted table:

╔══════════════════════════════════════════════════════╗
║ Pipeline Cost Forecast ║
╠══════════════════════════════════════════════════════╣
║ Stage Model Duration Est. Cost ║
║ ───────────────── ──────── ────────── ───────── ║
║ intake haiku 1m 0s 0ドル.02 ║
║ plan sonnet 5m 0s 0ドル.45 ║
║ build sonnet 20m 0s 3ドル.60 ║
║ test haiku 3m 0s 0ドル.04 ║
║ review opus 5m 0s 2ドル.25 ║
║ ───────────────── ──────── ────────── ───────── ║
║ TOTAL 34m 0s 6ドル.36 ║
║ Confidence: medium (12 historical runs) ║
║ Budget remaining: 43ドル.64 / 50ドル.00 ║
╚══════════════════════════════════════════════════════╝

Step 4: Add `cost_record_variance()` to sw-cost.sh

# cost_record_variance <forecast_usd> <actual_usd> <template> <issue>
cost_record_variance() {
 local forecast="1ドル" actual="2ドル" template="3ドル" issue="${4:-}"
 local variance pct_variance
 variance=$(awk -v f="$forecast" -v a="$actual" 'BEGIN{printf "%.4f", a - f}')
 pct_variance=$(awk -v f="$forecast" -v a="$actual" 'BEGIN{if(f>0) printf "%.1f", ((a-f)/f)*100; else print "0"}')
 emit_event "cost.forecast_variance" \
 "forecast_usd=$forecast" \
 "actual_usd=$actual" \
 "variance_usd=$variance" \
 "variance_pct=$pct_variance" \
 "template=$template" \
 "issue=$issue"
}

Step 5: Add `forecast` CLI subcommand to sw-cost.sh

Add to the case statement: forecast) cost_forecast_cli "$@" ;; This calls cost_forecast and pipes through cost_forecast_display for CLI usage: shipwright cost forecast --pipeline standard

Step 6: Integrate forecast into sw-pipeline.sh — flag parsing

Add --force-start to the argument parser (alongside existing --pipeline, --goal, etc.):

FORCE_START=false
# In the getopts/while loop:
--force-start) FORCE_START=true ;;

Step 7: Integrate forecast into sw-pipeline.sh — pre-start gate

After load_pipeline_config and before run_pipeline, insert:

# Cost forecast and budget gate
if [[ -f "$SCRIPT_DIR/sw-cost.sh" ]]; then
 source "$SCRIPT_DIR/sw-cost.sh"
 FORECAST_JSON=$(cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}" 2>/dev/null || echo "")
 if [[ -n "$FORECAST_JSON" ]]; then
 FORECAST_USD=$(echo "$FORECAST_JSON" | jq -r '.total_usd // 0')
 cost_forecast_display "$FORECAST_JSON"
 # Budget gate check
 local remaining
 remaining=$(cost_remaining_budget 2>/dev/null || echo "unlimited")
 if [[ "$remaining" != "unlimited" ]]; then
 if awk -v f="$FORECAST_USD" -v r="$remaining" 'BEGIN{exit !(f > r)}'; then
 if [[ "$FORCE_START" == "true" ]]; then
 warn "Forecast \$${FORECAST_USD} exceeds remaining budget \$${remaining} — proceeding (--force-start)"
 else
 error "Forecast \$${FORECAST_USD} exceeds remaining budget \$${remaining}"
 echo -e " Override: ${DIM}shipwright pipeline start ... --force-start${RESET}"
 exit 1
 fi
 fi
 fi
 emit_event "cost.forecast" \
 "forecast_usd=$FORECAST_USD" \
 "template=$PIPELINE_NAME" \
 "confidence=$(echo "$FORECAST_JSON" | jq -r '.confidence')" \
 "issue=${ISSUE_NUMBER:-0}"
 fi
fi

Step 8: Record variance at pipeline completion

After the pipeline.completed event emission (around line 2700), add:

if [[ -n "${FORECAST_USD:-}" && "${FORECAST_USD}" != "0" ]]; then
 cost_record_variance "$FORECAST_USD" "$total_cost" "$PIPELINE_NAME" "${ISSUE_NUMBER:-}"
fi

Step 9: Add dashboard API endpoint

In dashboard/src/core/api.ts, add handler for /api/costs/forecast:

Read recent cost.forecast and cost.forecast_variance events from events.jsonl
Return forecast data for display

Step 10: Add dashboard forecast display

In dashboard/src/views/metrics.ts, add renderCostForecast():

Show last forecast with confidence indicator
Show forecast vs actual variance trend (bar chart)
Color-code: green (within 20%), yellow (20-50% off), red (>50% off)

Step 11: Add tests to sw-cost-test.sh

Test cases:

cost_forecast with a template file returns valid JSON with expected fields
cost_forecast with no historical data uses defaults and returns confidence "low"
cost_forecast_display produces formatted output
cost_record_variance emits correct event
Budget gate blocks when forecast > remaining
Budget gate allows with --force-start
Variance calculation correctness (positive/negative)

Task Checklist

Task 1: Add default stage duration constants and token-rate heuristics to sw-cost.sh
Task 2: Implement cost_forecast() function with historical lookup and cold-start defaults
Task 3: Implement cost_forecast_display() formatted CLI output
Task 4: Implement cost_record_variance() with event emission
Task 5: Add forecast CLI subcommand to sw-cost.sh case statement
Task 6: Add --force-start flag parsing to sw-pipeline.sh
Task 7: Integrate forecast display and budget gate into pipeline_start()
Task 8: Record forecast variance at pipeline completion in sw-pipeline.sh
Task 9: Add dashboard API endpoint for forecast data
Task 10: Add dashboard forecast display component
Task 11: Write tests for all forecast functions in sw-cost-test.sh
Task 12: Run full test suite and fix any failures

Testing Approach

Unit Tests (sw-cost-test.sh)

Following existing test patterns in the repo (mock binaries, PASS/FAIL counters):

cost_forecast basic: Create a minimal template JSON, call cost_forecast, verify JSON output has total_usd, stages, confidence, data_points
cost_forecast cold-start: No events.jsonl → uses defaults, confidence = "low"
cost_forecast with history: Seed events.jsonl with stage.completed events, verify durations are averaged
cost_forecast_display: Verify output contains stage names, costs, total line
cost_record_variance: Verify event written to events.jsonl with correct fields
Variance math: forecast=5ドル.00 actual=6ドル.50 → variance=1ドル.50, pct=30.0%
Budget gate integration: Mock budget at 10,ドル forecast at 15,ドル verify exit code 1; with FORCE_START=true, verify exit code 0

Integration Tests

npm test — run existing suite to verify no regressions
Manual: shipwright cost forecast --pipeline cost-aware to verify CLI output

Definition of Done

cost_forecast() returns JSON with total_usd, stages[], confidence, data_points for any template
Forecast is displayed before pipeline starts in CLI with formatted table
Pipeline start is blocked when forecast exceeds remaining budget (exit 1)
--force-start flag overrides the budget gate with a warning
cost.forecast event emitted to events.jsonl before pipeline runs
cost.forecast_variance event emitted after pipeline completes with forecast vs actual
Dashboard shows cost forecast when pipeline is queued/starting
Confidence interval shown as low/medium/high based on historical data count
All new functions have tests in sw-cost-test.sh
npm test passes with no regressions
Bash 3.2 compatible (no associative arrays, no readarray)

Endpoint Specification

CLI Endpoint: `shipwright cost forecast`

Input: --pipeline <name> (template name, default: standard), --complexity <1-10> (default: 5)
Output: Formatted forecast table to stdout; --json flag for raw JSON
Exit codes: 0 (success), 1 (error loading template)

Event Schemas

// cost.forecast (emitted before pipeline start)
{"ts":"...","type":"cost.forecast","forecast_usd":"6.36","template":"standard","confidence":"medium","issue":"178"}
// cost.forecast_variance (emitted after pipeline completion)
{"ts":"...","type":"cost.forecast_variance","forecast_usd":"6.36","actual_usd":"7.12","variance_usd":"0.76","variance_pct":"11.9","template":"standard","issue":"178"}

Dashboard API: GET /api/costs/forecast

Response: { recent_forecasts: [{issue, template, forecast_usd, confidence, ts}], variance_history: [{forecast, actual, variance_pct, template, ts}] }
Error: { error: { code: "NO_DATA", message: "No forecast data available" } }

Rate Limiting: Not applicable (local dashboard, single-user). Versioning: Not applicable (internal API, no external consumers).

User Stories

Primary: As a pipeline operator, I want to see estimated pipeline cost before it starts, so that I can make informed go/no-go decisions and avoid budget surprises.

Secondary: As a team lead reviewing daemon activity, I want to see forecast accuracy over time in the dashboard, so that I can trust the estimates and tune budgets appropriately.

Acceptance Criteria (Given/When/Then)

Given a budget is set and a pipeline template is selected, when I run shipwright pipeline start, then I see a cost forecast table before execution begins
Given the forecast exceeds remaining budget, when I run pipeline start without --force-start, then the pipeline is blocked with an error message showing the override command
Given the forecast exceeds remaining budget, when I run with --force-start, then the pipeline proceeds with a warning
Given a pipeline completes, when I check events.jsonl, then I see a cost.forecast_variance event with forecast, actual, and percentage difference

Edge Cases from User Perspective

No historical data (cold start): Uses hardcoded defaults, shows confidence as "low" — user knows estimate is rough
Budget not configured: Forecast displays but gate is skipped — shows "Budget: unlimited"
Forecast is 0ドル (all stages disabled): Shows "No billable stages" message, skips gate

Component Hierarchy (Dashboard)

MetricsView
├── renderMetrics() (existing)
│ ├── Success Rate Donut
│ ├── Duration / Throughput
│ └── Cost Breakdown
└── renderCostForecast() (new)
 ├── ForecastCard (last forecast summary)
 │ ├── Total estimate
 │ ├── Confidence badge (low/medium/high)
 │ └── Budget status bar
 └── VarianceChart (forecast vs actual history)
 └── Bar chart with color-coded variance

State Management: Forecast data fetched via api.fetchForecastData() → stored in store → rendered by renderCostForecast(). Same pattern as existing cost views.

Accessibility: Semantic HTML (table for forecast data), color + text for confidence indicators, keyboard-accessible chart tooltips.

Responsive Breakpoints: Single-column layout at 320px-768px, two-column at 1024px+ (forecast card beside variance chart). Follows existing metrics grid pattern.

Monitoring Checklist

P0 — Immediate

Forecast function doesn't crash or block pipeline start (non-fatal wrapper)
Budget gate correctly blocks/allows based on comparison
--force-start override works

P1 — Short-term

Forecast accuracy: track variance_pct in events, alert if consistently >50%
No regressions in existing cost tracking

P2 — Medium-term

Forecast model accuracy improves as more historical data accumulates
Dashboard variance chart shows trend toward tighter estimates

Anomaly Detection Triggers

Forecast variance consistently >100% for 5+ pipelines → heuristic needs recalibration
cost.forecast events stop appearing → integration broken

Log Analysis

Check for cost_forecast errors in pipeline output
Monitor for jq parse failures on events.jsonl queries

Auto-Rollback Decision Criteria

Not applicable — this is a pre-start gate feature, not a deployment. If the forecast function fails, the pipeline proceeds normally (fail-open design).

Alternatives Considered

Approach	Pros	Cons
A: Inline in sw-pipeline.sh	Single file change	Mixes concerns, hard to test, bloats already large file (3000+ lines)
B: Functions in sw-cost.sh (chosen)	Clean separation, testable, follows existing patterns	Requires sourcing sw-cost.sh in pipeline (already done for some paths)
C: Separate sw-forecast.sh script	Maximum isolation	Adds a new file unnecessarily, more complex wiring

Risk Analysis

Risk	Impact	Mitigation
Forecast crashes pipeline start	High — blocks all pipelines	Wrap in `
Inaccurate estimates frustrate users	Medium — trust erosion	Show confidence level, improve with data
Budget gate too aggressive	Medium — blocks legitimate work	`--force-start` override, configurable gate
Historical data query slow on large events.jsonl	Low — one-time read at start	Use `tail -1000` to limit scan, cache result
Breaking Bash 3.2 compat	Medium — fails on macOS	No associative arrays, test on macOS

Pipeline Plan 178

Implementation Plan: Pipeline Cost Forecast and Budget Gate

Socratic Design Refinement

Requirements Clarity

Design Alternatives

Risk Assessment

Simplicity Check

Architecture Decision Record

Component Diagram

Interface Contracts

Data Flow

Error Boundaries

Files to Modify

Modified Files

No New Files Created

Implementation Steps

Step 1: Add default stage duration constants to sw-cost.sh

Step 2: Add cost_forecast() function to sw-cost.sh

Step 3: Add cost_forecast_display() to sw-cost.sh

Step 4: Add cost_record_variance() to sw-cost.sh

Step 5: Add forecast CLI subcommand to sw-cost.sh

Step 6: Integrate forecast into sw-pipeline.sh — flag parsing

Step 7: Integrate forecast into sw-pipeline.sh — pre-start gate

Step 8: Record variance at pipeline completion

Step 9: Add dashboard API endpoint

Step 10: Add dashboard forecast display

Step 11: Add tests to sw-cost-test.sh

Task Checklist

Testing Approach

Unit Tests (sw-cost-test.sh)

Integration Tests

Definition of Done

Endpoint Specification

CLI Endpoint: shipwright cost forecast

Event Schemas

Dashboard API: GET /api/costs/forecast

User Stories

Acceptance Criteria (Given/When/Then)

Edge Cases from User Perspective

Component Hierarchy (Dashboard)

Monitoring Checklist

P0 — Immediate

P1 — Short-term

P2 — Medium-term

Anomaly Detection Triggers

Log Analysis

Auto-Rollback Decision Criteria

Alternatives Considered

Risk Analysis

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Step 2: Add `cost_forecast()` function to sw-cost.sh

Step 3: Add `cost_forecast_display()` to sw-cost.sh

Step 4: Add `cost_record_variance()` to sw-cost.sh

Step 5: Add `forecast` CLI subcommand to sw-cost.sh

CLI Endpoint: `shipwright cost forecast`