Pipeline Plan 178

ezigus edited this page Mar 17, 2026 · 4 revisions

Implementation Plan: Pipeline Cost Forecast and Budget Gate with Early Warning

Issue: #178

Pipeline: full | Branch: feat/pipeline-cost-forecast-and-budget-gate-w-178

Socratic Design Refinement

Requirements Clarity

Minimum viable change: Estimate pipeline cost pre-start, display it, block if over budget, emit variance post-completion. The acceptance criteria are explicit and well-scoped.

Implicit requirements:

Cold-start behavior when no historical data exists (use conservative defaults)
The forecast CLI subcommand (shipwright cost forecast) for standalone use
Dashboard integration for queued pipeline visibility

Acceptance criteria (from issue):

Estimate pipeline cost using: template stage count x avg duration x model tier cost
Display forecast before pipeline start in CLI and dashboard
Block start if forecast exceeds remaining budget (configurable, with --force-start override)
Emit forecast vs actual cost variance to events.jsonl after pipeline completes
Show cost forecast in dashboard when pipeline is queued
Include confidence interval (low/medium/high) based on historical data quality

Design Alternatives

Approach A: Shell-native in sw-cost.sh (chosen)

Add cost_forecast(), cost_forecast_display(), cost_record_variance() to existing sw-cost.sh
Integrate budget gate directly into pipeline_start() in sw-pipeline.sh
Tradeoffs: +consistent with codebase patterns, +reuses existing pricing/budget infra, +no new dependencies, -bash arithmetic limitations (mitigated by awk)

Approach B: Node.js forecast module

Implement in TypeScript alongside dashboard
Tradeoffs: +better floating point, +easier testing, -breaks shell-first convention, -requires Node.js at pipeline start, -two systems to maintain

Decision: Approach A -- minimal blast radius, reuses existing cost_calculate(), cost_check_budget(), cost_remaining_budget(), pricing system, and event emission. Dashboard gets a read-only API endpoint.

Risk Assessment

Risk: Inaccurate forecasts with no history -> Mitigated by conservative default stage durations and "low" confidence label
Risk: Budget gate blocks legitimate work -> Mitigated by --force-start and --ignore-budget overrides
Risk: awk floating-point drift -> Acceptable for cost estimates (4 decimal places)
Risk: events.jsonl query slowness -> Mitigated by reading only last 1000 lines

Simplicity Check

All changes stay within 3 existing files (sw-cost.sh, sw-pipeline.sh, dashboard) plus tests. No new files needed for core logic. Reuses existing pricing, budget, and event infrastructure entirely.

Current Implementation Status

The feature was implemented in a prior pipeline iteration (commits abd44a8 through 055448a). All 12 tasks were marked complete. The implementation includes:

What Exists (verified by code review)

Component	File	Status
`cost_forecast()`	sw-cost.sh:941-1058	Complete -- historical lookup, cold-start defaults, complexity multiplier, confidence levels
`cost_forecast_display()`	sw-cost.sh:1062-1121	Complete -- formatted table with stage breakdown, confidence indicator, budget status
`cost_record_variance()`	sw-cost.sh:1123-1138	Complete -- emits `cost.forecast_variance` event with USD and % variance
`_forecast_token_rates()`	sw-cost.sh:925-937	Complete -- per-stage-category token rate heuristics
Default stage durations	sw-cost.sh:921	Complete -- JSON constant for cold-start fallback
`forecast` CLI subcommand	sw-cost.sh (cost_forecast_cli)	Complete -- `shipwright cost forecast --pipeline <name>`
`--force-start` flag	sw-pipeline.sh:300,463	Complete -- parsed and stored as `FORCE_START` variable
Budget gate in pipeline_start	sw-pipeline.sh:2597-2627	Complete -- blocks if forecast > remaining, supports override
Forecast event emission	sw-pipeline.sh:2620-2624	Complete -- `cost.forecast` event with USD, template, confidence, issue
Variance recording at completion	sw-pipeline.sh:2957-2962	Complete -- calls `cost_record_variance` with forecast vs actual
Dashboard API `/api/costs/forecast`	dashboard/server.ts:3721-3769	Complete -- returns recent_forecasts + variance_history
Dashboard `renderCostForecast()`	dashboard/src/views/metrics.ts:479-542	Complete -- forecast card + variance accuracy table
Dashboard API client	dashboard/src/core/api.ts	Complete -- `fetchCostForecast()` method
Tests	sw-cost-test.sh:230-400	Complete -- forecast JSON, display, variance, --force-start, pipeline integration

Gaps Identified

No early warning at 50-100% budget consumption -- The cost intelligence design guidance says "Warn (don't block) if forecast is 50-100% of remaining budget." Current code only blocks when forecast > remaining. No warning for high-consumption scenarios.
No confidence range display -- Forecast shows point estimate ($X.XX) but the issue requests confidence intervals (e.g., "50ドル-70ドル (medium confidence)"). Currently only shows confidence level as a label.
No estimated_cost on queued dashboard items -- The QueueItem type in dashboard has an estimated_cost? field, but the daemon dispatch doesn't populate it when queuing issues. Dashboard shows forecast in the metrics view but not inline on queued items.
Forecast event missing data_points field -- The cost.forecast event should include data_points for observability completeness.

Files to Modify

File	Action	Purpose
`scripts/sw-cost.sh`	Modify	Add confidence range (low/high bounds) to forecast JSON output
`scripts/sw-cost.sh`	Modify	Update `cost_forecast_display()` to show cost range
`scripts/sw-pipeline.sh`	Modify	Add early warning when forecast consumes 50-100% of remaining budget
`scripts/sw-pipeline.sh`	Modify	Add `data_points` field to `cost.forecast` event
`scripts/lib/daemon-dispatch.sh`	Modify	Attach forecast_usd to queued job metadata
`dashboard/server.ts`	Modify	Include `estimated_cost` in queue item response
`scripts/sw-cost-test.sh`	Modify	Add tests for confidence range and early warning

Implementation Steps

Step 1: Add confidence range bounds to `cost_forecast()`

In sw-cost.sh, enhance the forecast JSON output to include low_usd and high_usd bounds:

High confidence: +/-15% range (tight)
Medium confidence: +/-30% range
Low confidence: +/-50% range (wide, conservative)

Modify the final jq -n call in cost_forecast() (line 1052) to compute and add low_usd and high_usd fields based on confidence level and total_usd.

Step 2: Update `cost_forecast_display()` to show range

Change the TOTAL line from $X.XX to $LOW - $HIGH format when confidence is not "high". For high confidence, keep the point estimate. Update the Confidence line: medium (5 historical runs) -- est. 45ドル-75ドル.

Step 3: Add early warning in pipeline budget gate

In sw-pipeline.sh lines 2604-2627, after the "exceeds budget" check, add a warning check:

If forecast consumes >50% of remaining budget but doesn't exceed it, emit warn with message like "Forecast $X will consume Y% of remaining budget $Z"
Emit cost.budget_high_usage event for tracking

Step 4: Add `data_points` to forecast event

In sw-pipeline.sh line 2620-2624, add data_points= to the emit_event "cost.forecast" call.

Step 5: Populate `estimated_cost` for queued dashboard items

In daemon-dispatch.sh, after computing the forecast for pre-spawn budget check, attach forecast_usd to the job metadata so the dashboard /api/status endpoint can include it in queue items.

Step 6: Include `estimated_cost` in dashboard queue response

In dashboard/server.ts, when building queue items for /api/status, read the forecast_usd from job metadata and include it as estimated_cost in the response.

Step 7: Update tests

Add to sw-cost-test.sh:

Test that forecast JSON includes low_usd and high_usd fields
Test that confidence range width varies by confidence level
Verify early warning behavior via grep in sw-pipeline.sh

Step 8: Run full test suite

Execute npm test and fix any failures.

Task Checklist

Task 1: Add low_usd and high_usd confidence range fields to cost_forecast() JSON output in sw-cost.sh
Task 2: Update cost_forecast_display() to show cost range instead of point estimate
Task 3: Add early warning in sw-pipeline.sh when forecast consumes 50-100% of remaining budget
Task 4: Add data_points field to cost.forecast event emission in sw-pipeline.sh
Task 5: Attach forecast_usd to queued job metadata in daemon-dispatch.sh for dashboard
Task 6: Include estimated_cost in dashboard queue item response in server.ts
Task 7: Add tests for confidence range fields and early warning in sw-cost-test.sh
Task 8: Run full test suite and verify all tests pass

Dependencies: Task 1 blocks Tasks 2 and 7. Task 5 blocks Task 6. All other tasks are independent.

Testing Approach

Unit tests (sw-cost-test.sh):
- Verify cost_forecast() JSON output includes low_usd, high_usd fields
- Verify range width varies: low confidence has wider range than high
- Verify cost_forecast_display() output includes range format
- Grep-based verification that early warning code exists in sw-pipeline.sh
Integration verification:
- shipwright cost forecast --pipeline standard --json returns complete JSON with new fields
- shipwright cost forecast --pipeline standard displays range in formatted output
Regression:
- npm test -- full test suite must pass
- Existing cost tests must continue to pass unchanged

Definition of Done

cost_forecast() returns JSON with total_usd, low_usd, high_usd, stages[], confidence, data_points, complexity_multiplier
Confidence range displayed in CLI forecast table (e.g., "45ドル - 75ドル (medium confidence)")
Pipeline warns (but doesn't block) when forecast is 50-100% of remaining budget
cost.forecast event includes data_points field
Dashboard queue items show estimated_cost when available
All existing tests continue to pass
New tests cover confidence range and early warning

Alternatives Considered

Approach	Pros	Cons	Decision
A: Enhance existing implementation	Minimal changes, builds on working code, low risk	Incremental, not a rewrite	Chosen -- feature is 90% done, only gaps remain
B: Statistical confidence intervals	More rigorous (std dev based), adapts automatically	Requires sufficient historical data, complex bash math, overkill for current data volume	Deferred -- can iterate later when more data exists
C: Do nothing (accept current state)	Zero risk	Misses early warning and range display from acceptance criteria	Rejected -- gaps are small but meaningful for user experience

Risk Analysis

Risk	Impact	Likelihood	Mitigation
Confidence range math error in awk	Low -- cosmetic only, doesn't affect blocking	Low	Unit test verifies bounds
Early warning spam when budget is tight	Medium -- annoying repeated warnings	Low	Only warn once per pipeline start
daemon-dispatch.sh changes break spawning	High -- daemon stops working	Low	Minimal change (add metadata field), test daemon start
Dashboard type mismatch	Low -- TypeScript catches at build	Very low	Existing QueueItem type already has `estimated_cost?` optional field

User Stories

Primary: As a pipeline operator, I want to see the estimated cost range before a pipeline starts, so that I can make informed go/no-go decisions based on my budget.

Secondary: As a team lead monitoring multiple daemon pipelines, I want early warning when a pipeline will consume a large portion of my remaining budget, so that I can intervene before the budget is exhausted.

Acceptance Criteria (Given/When/Then)

Given a configured budget and a pipeline about to start, When the forecast exceeds remaining budget, Then the pipeline is blocked with an error and a --force-start override hint
Given a configured budget, When the forecast consumes >50% of remaining budget, Then a warning is displayed (but pipeline proceeds)
Given historical pipeline data, When a forecast is generated, Then confidence range bounds reflect data quality (tight for high confidence, wide for low)
Given a completed pipeline with a pre-start forecast, When the pipeline finishes, Then a cost.forecast_variance event is emitted with forecast, actual, and percentage variance

Edge Cases from User Perspective

First run (no history): Shows "low confidence" with wide range ($X - $Y), uses conservative defaults -- user sees worst-case estimate
Budget nearly exhausted: Warns at 50%+ consumption, blocks at 100%+ -- user gets graduated feedback
Force override: --force-start bypasses budget gate with explicit warning -- user maintains control

Endpoint Specification

GET /api/costs/forecast

Request: ?period=30 (days, optional, default 30)
Response: { recent_forecasts: [{issue, template, forecast_usd, confidence, ts}], variance_history: [{forecast_usd, actual_usd, variance_pct, template, ts}] }
Status codes: 200 OK, 500 Internal Server Error
No auth required (local dashboard)

Error Codes

200: Success with data
200 with empty arrays: No forecast data available yet
500: Internal error reading events.jsonl

Rate Limiting

Not applicable -- local CLI tool and dashboard, not a public API.

Versioning

Not applicable -- internal API, no external consumers.

Component Hierarchy (Dashboard)

MetricsView
 |-- renderCostBreakdown() -- existing
 |-- renderCostTrend() -- existing
 |-- renderCostForecast() -- enhanced (shows range)
 | |-- Forecast Card -- latest forecast with confidence range
 | +-- Variance Table -- historical accuracy
 +-- ... other metrics

State Management Approach

Dashboard fetches forecast data via api.fetchCostForecast() -> REST endpoint
Data flows: server.ts reads events.jsonl -> filters forecast/variance events -> returns JSON -> metrics.ts renders
No local state needed -- pure render from server data

Accessibility Checklist

Semantic HTML tables for variance data (existing)
Color-coded confidence uses text labels alongside color (existing: "high"/"medium"/"low" text)
Keyboard navigable (standard HTML elements)

Responsive Breakpoints

Not applicable -- dashboard is a developer tool used on desktop monitors. Existing responsive behavior from dashboard framework applies.

Pipeline Plan 178

Implementation Plan: Pipeline Cost Forecast and Budget Gate with Early Warning

Issue: #178

Pipeline: full | Branch: feat/pipeline-cost-forecast-and-budget-gate-w-178

Socratic Design Refinement

Requirements Clarity

Design Alternatives

Risk Assessment

Simplicity Check

Current Implementation Status

What Exists (verified by code review)

Gaps Identified

Files to Modify

Implementation Steps

Step 1: Add confidence range bounds to cost_forecast()

Step 2: Update cost_forecast_display() to show range

Step 3: Add early warning in pipeline budget gate

Step 4: Add data_points to forecast event

Step 5: Populate estimated_cost for queued dashboard items

Step 6: Include estimated_cost in dashboard queue response

Step 7: Update tests

Step 8: Run full test suite

Task Checklist

Testing Approach

Definition of Done

Alternatives Considered

Risk Analysis

User Stories

Acceptance Criteria (Given/When/Then)

Edge Cases from User Perspective

Endpoint Specification

GET /api/costs/forecast

Error Codes

Rate Limiting

Versioning

Component Hierarchy (Dashboard)

State Management Approach

Accessibility Checklist

Responsive Breakpoints

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Step 1: Add confidence range bounds to `cost_forecast()`

Step 2: Update `cost_forecast_display()` to show range

Step 4: Add `data_points` to forecast event

Step 5: Populate `estimated_cost` for queued dashboard items

Step 6: Include `estimated_cost` in dashboard queue response