-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 178
Minimum viable change: Estimate pipeline cost pre-start, display it, block if over budget, emit variance post-completion. The acceptance criteria are explicit and well-scoped.
Implicit requirements:
- Cold-start behavior when no historical data exists (use conservative defaults)
- The forecast CLI subcommand (
shipwright cost forecast) for standalone use - Dashboard integration for queued pipeline visibility
Acceptance criteria (from issue):
- Estimate pipeline cost using: template stage count x avg duration x model tier cost
- Display forecast before pipeline start in CLI and dashboard
- Block start if forecast exceeds remaining budget (configurable, with
--force-startoverride) - Emit forecast vs actual cost variance to events.jsonl after pipeline completes
- Show cost forecast in dashboard when pipeline is queued
- Include confidence interval (low/medium/high) based on historical data quality
Approach A: Shell-native in sw-cost.sh (chosen)
- Add
cost_forecast(),cost_forecast_display(),cost_record_variance()to existing sw-cost.sh - Integrate budget gate directly into
pipeline_start()in sw-pipeline.sh - Tradeoffs: +consistent with codebase patterns, +reuses existing pricing/budget infra, +no new dependencies, -bash arithmetic limitations (mitigated by awk)
Approach B: Node.js forecast module
- Implement in TypeScript alongside dashboard
- Tradeoffs: +better floating point, +easier testing, -breaks shell-first convention, -requires Node.js at pipeline start, -two systems to maintain
Decision: Approach A -- minimal blast radius, reuses existing cost_calculate(), cost_check_budget(), cost_remaining_budget(), pricing system, and event emission. Dashboard gets a read-only API endpoint.
- Risk: Inaccurate forecasts with no history -> Mitigated by conservative default stage durations and "low" confidence label
-
Risk: Budget gate blocks legitimate work -> Mitigated by
--force-startand--ignore-budgetoverrides - Risk: awk floating-point drift -> Acceptable for cost estimates (4 decimal places)
- Risk: events.jsonl query slowness -> Mitigated by reading only last 1000 lines
All changes stay within 3 existing files (sw-cost.sh, sw-pipeline.sh, dashboard) plus tests. No new files needed for core logic. Reuses existing pricing, budget, and event infrastructure entirely.
The feature was implemented in a prior pipeline iteration (commits abd44a8 through 055448a). All 12 tasks were marked complete. The implementation includes:
| Component | File | Status |
|---|---|---|
cost_forecast() |
sw-cost.sh:941-1058 | Complete -- historical lookup, cold-start defaults, complexity multiplier, confidence levels |
cost_forecast_display() |
sw-cost.sh:1062-1121 | Complete -- formatted table with stage breakdown, confidence indicator, budget status |
cost_record_variance() |
sw-cost.sh:1123-1138 | Complete -- emits cost.forecast_variance event with USD and % variance |
_forecast_token_rates() |
sw-cost.sh:925-937 | Complete -- per-stage-category token rate heuristics |
| Default stage durations | sw-cost.sh:921 | Complete -- JSON constant for cold-start fallback |
forecast CLI subcommand |
sw-cost.sh (cost_forecast_cli) | Complete -- shipwright cost forecast --pipeline <name>
|
--force-start flag |
sw-pipeline.sh:300,463 | Complete -- parsed and stored as FORCE_START variable |
| Budget gate in pipeline_start | sw-pipeline.sh:2597-2627 | Complete -- blocks if forecast > remaining, supports override |
| Forecast event emission | sw-pipeline.sh:2620-2624 | Complete -- cost.forecast event with USD, template, confidence, issue |
| Variance recording at completion | sw-pipeline.sh:2957-2962 | Complete -- calls cost_record_variance with forecast vs actual |
Dashboard API /api/costs/forecast
|
dashboard/server.ts:3721-3769 | Complete -- returns recent_forecasts + variance_history |
Dashboard renderCostForecast()
|
dashboard/src/views/metrics.ts:479-542 | Complete -- forecast card + variance accuracy table |
| Dashboard API client | dashboard/src/core/api.ts | Complete -- fetchCostForecast() method |
| Tests | sw-cost-test.sh:230-400 | Complete -- forecast JSON, display, variance, --force-start, pipeline integration |
-
No early warning at 50-100% budget consumption -- The cost intelligence design guidance says "Warn (don't block) if forecast is 50-100% of remaining budget." Current code only blocks when forecast > remaining. No warning for high-consumption scenarios.
-
No confidence range display -- Forecast shows point estimate (
$X.XX) but the issue requests confidence intervals (e.g., "50ドル-70ドル (medium confidence)"). Currently only shows confidence level as a label. -
No
estimated_coston queued dashboard items -- TheQueueItemtype in dashboard has anestimated_cost?field, but the daemon dispatch doesn't populate it when queuing issues. Dashboard shows forecast in the metrics view but not inline on queued items. -
Forecast event missing
data_pointsfield -- Thecost.forecastevent should includedata_pointsfor observability completeness.
| File | Action | Purpose |
|---|---|---|
scripts/sw-cost.sh |
Modify | Add confidence range (low/high bounds) to forecast JSON output |
scripts/sw-cost.sh |
Modify | Update cost_forecast_display() to show cost range |
scripts/sw-pipeline.sh |
Modify | Add early warning when forecast consumes 50-100% of remaining budget |
scripts/sw-pipeline.sh |
Modify | Add data_points field to cost.forecast event |
scripts/lib/daemon-dispatch.sh |
Modify | Attach forecast_usd to queued job metadata |
dashboard/server.ts |
Modify | Include estimated_cost in queue item response |
scripts/sw-cost-test.sh |
Modify | Add tests for confidence range and early warning |
In sw-cost.sh, enhance the forecast JSON output to include low_usd and high_usd bounds:
- High confidence: +/-15% range (tight)
- Medium confidence: +/-30% range
- Low confidence: +/-50% range (wide, conservative)
Modify the final jq -n call in cost_forecast() (line 1052) to compute and add low_usd and high_usd fields based on confidence level and total_usd.
Change the TOTAL line from $X.XX to $LOW - $HIGH format when confidence is not "high". For high confidence, keep the point estimate. Update the Confidence line: medium (5 historical runs) -- est. 45ドル-75ドル.
In sw-pipeline.sh lines 2604-2627, after the "exceeds budget" check, add a warning check:
- If forecast consumes >50% of remaining budget but doesn't exceed it, emit
warnwith message like "Forecast $X will consume Y% of remaining budget $Z" - Emit
cost.budget_high_usageevent for tracking
In sw-pipeline.sh line 2620-2624, add data_points= to the emit_event "cost.forecast" call.
In daemon-dispatch.sh, after computing the forecast for pre-spawn budget check, attach forecast_usd to the job metadata so the dashboard /api/status endpoint can include it in queue items.
In dashboard/server.ts, when building queue items for /api/status, read the forecast_usd from job metadata and include it as estimated_cost in the response.
Add to sw-cost-test.sh:
- Test that forecast JSON includes
low_usdandhigh_usdfields - Test that confidence range width varies by confidence level
- Verify early warning behavior via grep in sw-pipeline.sh
Execute npm test and fix any failures.
- Task 1: Add
low_usdandhigh_usdconfidence range fields tocost_forecast()JSON output insw-cost.sh - Task 2: Update
cost_forecast_display()to show cost range instead of point estimate - Task 3: Add early warning in
sw-pipeline.shwhen forecast consumes 50-100% of remaining budget - Task 4: Add
data_pointsfield tocost.forecastevent emission insw-pipeline.sh - Task 5: Attach
forecast_usdto queued job metadata indaemon-dispatch.shfor dashboard - Task 6: Include
estimated_costin dashboard queue item response inserver.ts - Task 7: Add tests for confidence range fields and early warning in
sw-cost-test.sh - Task 8: Run full test suite and verify all tests pass
Dependencies: Task 1 blocks Tasks 2 and 7. Task 5 blocks Task 6. All other tasks are independent.
-
Unit tests (sw-cost-test.sh):
- Verify
cost_forecast()JSON output includeslow_usd,high_usdfields - Verify range width varies: low confidence has wider range than high
- Verify
cost_forecast_display()output includes range format - Grep-based verification that early warning code exists in sw-pipeline.sh
- Verify
-
Integration verification:
-
shipwright cost forecast --pipeline standard --jsonreturns complete JSON with new fields -
shipwright cost forecast --pipeline standarddisplays range in formatted output
-
-
Regression:
-
npm test-- full test suite must pass - Existing cost tests must continue to pass unchanged
-
-
cost_forecast()returns JSON withtotal_usd,low_usd,high_usd,stages[],confidence,data_points,complexity_multiplier - Confidence range displayed in CLI forecast table (e.g., "45ドル - 75ドル (medium confidence)")
- Pipeline warns (but doesn't block) when forecast is 50-100% of remaining budget
-
cost.forecastevent includesdata_pointsfield - Dashboard queue items show
estimated_costwhen available - All existing tests continue to pass
- New tests cover confidence range and early warning
| Approach | Pros | Cons | Decision |
|---|---|---|---|
| A: Enhance existing implementation | Minimal changes, builds on working code, low risk | Incremental, not a rewrite | Chosen -- feature is 90% done, only gaps remain |
| B: Statistical confidence intervals | More rigorous (std dev based), adapts automatically | Requires sufficient historical data, complex bash math, overkill for current data volume | Deferred -- can iterate later when more data exists |
| C: Do nothing (accept current state) | Zero risk | Misses early warning and range display from acceptance criteria | Rejected -- gaps are small but meaningful for user experience |
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Confidence range math error in awk | Low -- cosmetic only, doesn't affect blocking | Low | Unit test verifies bounds |
| Early warning spam when budget is tight | Medium -- annoying repeated warnings | Low | Only warn once per pipeline start |
| daemon-dispatch.sh changes break spawning | High -- daemon stops working | Low | Minimal change (add metadata field), test daemon start |
| Dashboard type mismatch | Low -- TypeScript catches at build | Very low | Existing QueueItem type already has estimated_cost? optional field |
Primary: As a pipeline operator, I want to see the estimated cost range before a pipeline starts, so that I can make informed go/no-go decisions based on my budget.
Secondary: As a team lead monitoring multiple daemon pipelines, I want early warning when a pipeline will consume a large portion of my remaining budget, so that I can intervene before the budget is exhausted.
-
Given a configured budget and a pipeline about to start, When the forecast exceeds remaining budget, Then the pipeline is blocked with an error and a
--force-startoverride hint - Given a configured budget, When the forecast consumes >50% of remaining budget, Then a warning is displayed (but pipeline proceeds)
- Given historical pipeline data, When a forecast is generated, Then confidence range bounds reflect data quality (tight for high confidence, wide for low)
-
Given a completed pipeline with a pre-start forecast, When the pipeline finishes, Then a
cost.forecast_varianceevent is emitted with forecast, actual, and percentage variance
- First run (no history): Shows "low confidence" with wide range ($X - $Y), uses conservative defaults -- user sees worst-case estimate
- Budget nearly exhausted: Warns at 50%+ consumption, blocks at 100%+ -- user gets graduated feedback
-
Force override:
--force-startbypasses budget gate with explicit warning -- user maintains control
-
Request:
?period=30(days, optional, default 30) -
Response:
{ recent_forecasts: [{issue, template, forecast_usd, confidence, ts}], variance_history: [{forecast_usd, actual_usd, variance_pct, template, ts}] } - Status codes: 200 OK, 500 Internal Server Error
- No auth required (local dashboard)
- 200: Success with data
- 200 with empty arrays: No forecast data available yet
- 500: Internal error reading events.jsonl
Not applicable -- local CLI tool and dashboard, not a public API.
Not applicable -- internal API, no external consumers.
MetricsView
|-- renderCostBreakdown() -- existing
|-- renderCostTrend() -- existing
|-- renderCostForecast() -- enhanced (shows range)
| |-- Forecast Card -- latest forecast with confidence range
| +-- Variance Table -- historical accuracy
+-- ... other metrics
- Dashboard fetches forecast data via
api.fetchCostForecast()-> REST endpoint - Data flows: server.ts reads events.jsonl -> filters forecast/variance events -> returns JSON -> metrics.ts renders
- No local state needed -- pure render from server data
- Semantic HTML tables for variance data (existing)
- Color-coded confidence uses text labels alongside color (existing: "high"/"medium"/"low" text)
- Keyboard navigable (standard HTML elements)
Not applicable -- dashboard is a developer tool used on desktop monitors. Existing responsive behavior from dashboard framework applies.