Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 178

ezigus edited this page Mar 17, 2026 · 4 revisions

Implementation Plan: Pipeline Cost Forecast and Budget Gate with Early Warning

Issue: #178

Pipeline: full | Branch: feat/pipeline-cost-forecast-and-budget-gate-w-178


Socratic Design Refinement

Requirements Clarity

Minimum viable change: Estimate pipeline cost pre-start, display it, block if over budget, emit variance post-completion. The acceptance criteria are explicit and well-scoped.

Implicit requirements:

  • Cold-start behavior when no historical data exists (use conservative defaults)
  • The forecast CLI subcommand (shipwright cost forecast) for standalone use
  • Dashboard integration for queued pipeline visibility

Acceptance criteria (from issue):

  1. Estimate pipeline cost using: template stage count x avg duration x model tier cost
  2. Display forecast before pipeline start in CLI and dashboard
  3. Block start if forecast exceeds remaining budget (configurable, with --force-start override)
  4. Emit forecast vs actual cost variance to events.jsonl after pipeline completes
  5. Show cost forecast in dashboard when pipeline is queued
  6. Include confidence interval (low/medium/high) based on historical data quality

Design Alternatives

Approach A: Shell-native in sw-cost.sh (chosen)

  • Add cost_forecast(), cost_forecast_display(), cost_record_variance() to existing sw-cost.sh
  • Integrate budget gate directly into pipeline_start() in sw-pipeline.sh
  • Tradeoffs: +consistent with codebase patterns, +reuses existing pricing/budget infra, +no new dependencies, -bash arithmetic limitations (mitigated by awk)

Approach B: Node.js forecast module

  • Implement in TypeScript alongside dashboard
  • Tradeoffs: +better floating point, +easier testing, -breaks shell-first convention, -requires Node.js at pipeline start, -two systems to maintain

Decision: Approach A -- minimal blast radius, reuses existing cost_calculate(), cost_check_budget(), cost_remaining_budget(), pricing system, and event emission. Dashboard gets a read-only API endpoint.

Risk Assessment

  • Risk: Inaccurate forecasts with no history -> Mitigated by conservative default stage durations and "low" confidence label
  • Risk: Budget gate blocks legitimate work -> Mitigated by --force-start and --ignore-budget overrides
  • Risk: awk floating-point drift -> Acceptable for cost estimates (4 decimal places)
  • Risk: events.jsonl query slowness -> Mitigated by reading only last 1000 lines

Simplicity Check

All changes stay within 3 existing files (sw-cost.sh, sw-pipeline.sh, dashboard) plus tests. No new files needed for core logic. Reuses existing pricing, budget, and event infrastructure entirely.


Current Implementation Status

The feature was implemented in a prior pipeline iteration (commits abd44a8 through 055448a). All 12 tasks were marked complete. The implementation includes:

What Exists (verified by code review)

Component File Status
cost_forecast() sw-cost.sh:941-1058 Complete -- historical lookup, cold-start defaults, complexity multiplier, confidence levels
cost_forecast_display() sw-cost.sh:1062-1121 Complete -- formatted table with stage breakdown, confidence indicator, budget status
cost_record_variance() sw-cost.sh:1123-1138 Complete -- emits cost.forecast_variance event with USD and % variance
_forecast_token_rates() sw-cost.sh:925-937 Complete -- per-stage-category token rate heuristics
Default stage durations sw-cost.sh:921 Complete -- JSON constant for cold-start fallback
forecast CLI subcommand sw-cost.sh (cost_forecast_cli) Complete -- shipwright cost forecast --pipeline <name>
--force-start flag sw-pipeline.sh:300,463 Complete -- parsed and stored as FORCE_START variable
Budget gate in pipeline_start sw-pipeline.sh:2597-2627 Complete -- blocks if forecast > remaining, supports override
Forecast event emission sw-pipeline.sh:2620-2624 Complete -- cost.forecast event with USD, template, confidence, issue
Variance recording at completion sw-pipeline.sh:2957-2962 Complete -- calls cost_record_variance with forecast vs actual
Dashboard API /api/costs/forecast dashboard/server.ts:3721-3769 Complete -- returns recent_forecasts + variance_history
Dashboard renderCostForecast() dashboard/src/views/metrics.ts:479-542 Complete -- forecast card + variance accuracy table
Dashboard API client dashboard/src/core/api.ts Complete -- fetchCostForecast() method
Tests sw-cost-test.sh:230-400 Complete -- forecast JSON, display, variance, --force-start, pipeline integration

Gaps Identified

  1. No early warning at 50-100% budget consumption -- The cost intelligence design guidance says "Warn (don't block) if forecast is 50-100% of remaining budget." Current code only blocks when forecast > remaining. No warning for high-consumption scenarios.

  2. No confidence range display -- Forecast shows point estimate ($X.XX) but the issue requests confidence intervals (e.g., "50ドル-70ドル (medium confidence)"). Currently only shows confidence level as a label.

  3. No estimated_cost on queued dashboard items -- The QueueItem type in dashboard has an estimated_cost? field, but the daemon dispatch doesn't populate it when queuing issues. Dashboard shows forecast in the metrics view but not inline on queued items.

  4. Forecast event missing data_points field -- The cost.forecast event should include data_points for observability completeness.


Files to Modify

File Action Purpose
scripts/sw-cost.sh Modify Add confidence range (low/high bounds) to forecast JSON output
scripts/sw-cost.sh Modify Update cost_forecast_display() to show cost range
scripts/sw-pipeline.sh Modify Add early warning when forecast consumes 50-100% of remaining budget
scripts/sw-pipeline.sh Modify Add data_points field to cost.forecast event
scripts/lib/daemon-dispatch.sh Modify Attach forecast_usd to queued job metadata
dashboard/server.ts Modify Include estimated_cost in queue item response
scripts/sw-cost-test.sh Modify Add tests for confidence range and early warning

Implementation Steps

Step 1: Add confidence range bounds to cost_forecast()

In sw-cost.sh, enhance the forecast JSON output to include low_usd and high_usd bounds:

  • High confidence: +/-15% range (tight)
  • Medium confidence: +/-30% range
  • Low confidence: +/-50% range (wide, conservative)

Modify the final jq -n call in cost_forecast() (line 1052) to compute and add low_usd and high_usd fields based on confidence level and total_usd.

Step 2: Update cost_forecast_display() to show range

Change the TOTAL line from $X.XX to $LOW - $HIGH format when confidence is not "high". For high confidence, keep the point estimate. Update the Confidence line: medium (5 historical runs) -- est. 45ドル-75ドル.

Step 3: Add early warning in pipeline budget gate

In sw-pipeline.sh lines 2604-2627, after the "exceeds budget" check, add a warning check:

  • If forecast consumes >50% of remaining budget but doesn't exceed it, emit warn with message like "Forecast $X will consume Y% of remaining budget $Z"
  • Emit cost.budget_high_usage event for tracking

Step 4: Add data_points to forecast event

In sw-pipeline.sh line 2620-2624, add data_points= to the emit_event "cost.forecast" call.

Step 5: Populate estimated_cost for queued dashboard items

In daemon-dispatch.sh, after computing the forecast for pre-spawn budget check, attach forecast_usd to the job metadata so the dashboard /api/status endpoint can include it in queue items.

Step 6: Include estimated_cost in dashboard queue response

In dashboard/server.ts, when building queue items for /api/status, read the forecast_usd from job metadata and include it as estimated_cost in the response.

Step 7: Update tests

Add to sw-cost-test.sh:

  • Test that forecast JSON includes low_usd and high_usd fields
  • Test that confidence range width varies by confidence level
  • Verify early warning behavior via grep in sw-pipeline.sh

Step 8: Run full test suite

Execute npm test and fix any failures.


Task Checklist

  • Task 1: Add low_usd and high_usd confidence range fields to cost_forecast() JSON output in sw-cost.sh
  • Task 2: Update cost_forecast_display() to show cost range instead of point estimate
  • Task 3: Add early warning in sw-pipeline.sh when forecast consumes 50-100% of remaining budget
  • Task 4: Add data_points field to cost.forecast event emission in sw-pipeline.sh
  • Task 5: Attach forecast_usd to queued job metadata in daemon-dispatch.sh for dashboard
  • Task 6: Include estimated_cost in dashboard queue item response in server.ts
  • Task 7: Add tests for confidence range fields and early warning in sw-cost-test.sh
  • Task 8: Run full test suite and verify all tests pass

Dependencies: Task 1 blocks Tasks 2 and 7. Task 5 blocks Task 6. All other tasks are independent.


Testing Approach

  1. Unit tests (sw-cost-test.sh):

    • Verify cost_forecast() JSON output includes low_usd, high_usd fields
    • Verify range width varies: low confidence has wider range than high
    • Verify cost_forecast_display() output includes range format
    • Grep-based verification that early warning code exists in sw-pipeline.sh
  2. Integration verification:

    • shipwright cost forecast --pipeline standard --json returns complete JSON with new fields
    • shipwright cost forecast --pipeline standard displays range in formatted output
  3. Regression:

    • npm test -- full test suite must pass
    • Existing cost tests must continue to pass unchanged

Definition of Done

  • cost_forecast() returns JSON with total_usd, low_usd, high_usd, stages[], confidence, data_points, complexity_multiplier
  • Confidence range displayed in CLI forecast table (e.g., "45ドル - 75ドル (medium confidence)")
  • Pipeline warns (but doesn't block) when forecast is 50-100% of remaining budget
  • cost.forecast event includes data_points field
  • Dashboard queue items show estimated_cost when available
  • All existing tests continue to pass
  • New tests cover confidence range and early warning

Alternatives Considered

Approach Pros Cons Decision
A: Enhance existing implementation Minimal changes, builds on working code, low risk Incremental, not a rewrite Chosen -- feature is 90% done, only gaps remain
B: Statistical confidence intervals More rigorous (std dev based), adapts automatically Requires sufficient historical data, complex bash math, overkill for current data volume Deferred -- can iterate later when more data exists
C: Do nothing (accept current state) Zero risk Misses early warning and range display from acceptance criteria Rejected -- gaps are small but meaningful for user experience

Risk Analysis

Risk Impact Likelihood Mitigation
Confidence range math error in awk Low -- cosmetic only, doesn't affect blocking Low Unit test verifies bounds
Early warning spam when budget is tight Medium -- annoying repeated warnings Low Only warn once per pipeline start
daemon-dispatch.sh changes break spawning High -- daemon stops working Low Minimal change (add metadata field), test daemon start
Dashboard type mismatch Low -- TypeScript catches at build Very low Existing QueueItem type already has estimated_cost? optional field

User Stories

Primary: As a pipeline operator, I want to see the estimated cost range before a pipeline starts, so that I can make informed go/no-go decisions based on my budget.

Secondary: As a team lead monitoring multiple daemon pipelines, I want early warning when a pipeline will consume a large portion of my remaining budget, so that I can intervene before the budget is exhausted.

Acceptance Criteria (Given/When/Then)

  • Given a configured budget and a pipeline about to start, When the forecast exceeds remaining budget, Then the pipeline is blocked with an error and a --force-start override hint
  • Given a configured budget, When the forecast consumes >50% of remaining budget, Then a warning is displayed (but pipeline proceeds)
  • Given historical pipeline data, When a forecast is generated, Then confidence range bounds reflect data quality (tight for high confidence, wide for low)
  • Given a completed pipeline with a pre-start forecast, When the pipeline finishes, Then a cost.forecast_variance event is emitted with forecast, actual, and percentage variance

Edge Cases from User Perspective

  1. First run (no history): Shows "low confidence" with wide range ($X - $Y), uses conservative defaults -- user sees worst-case estimate
  2. Budget nearly exhausted: Warns at 50%+ consumption, blocks at 100%+ -- user gets graduated feedback
  3. Force override: --force-start bypasses budget gate with explicit warning -- user maintains control

Endpoint Specification

GET /api/costs/forecast

  • Request: ?period=30 (days, optional, default 30)
  • Response: { recent_forecasts: [{issue, template, forecast_usd, confidence, ts}], variance_history: [{forecast_usd, actual_usd, variance_pct, template, ts}] }
  • Status codes: 200 OK, 500 Internal Server Error
  • No auth required (local dashboard)

Error Codes

  • 200: Success with data
  • 200 with empty arrays: No forecast data available yet
  • 500: Internal error reading events.jsonl

Rate Limiting

Not applicable -- local CLI tool and dashboard, not a public API.

Versioning

Not applicable -- internal API, no external consumers.


Component Hierarchy (Dashboard)

MetricsView
 |-- renderCostBreakdown() -- existing
 |-- renderCostTrend() -- existing
 |-- renderCostForecast() -- enhanced (shows range)
 | |-- Forecast Card -- latest forecast with confidence range
 | +-- Variance Table -- historical accuracy
 +-- ... other metrics

State Management Approach

  • Dashboard fetches forecast data via api.fetchCostForecast() -> REST endpoint
  • Data flows: server.ts reads events.jsonl -> filters forecast/variance events -> returns JSON -> metrics.ts renders
  • No local state needed -- pure render from server data

Accessibility Checklist

  • Semantic HTML tables for variance data (existing)
  • Color-coded confidence uses text labels alongside color (existing: "high"/"medium"/"low" text)
  • Keyboard navigable (standard HTML elements)

Responsive Breakpoints

Not applicable -- dashboard is a developer tool used on desktop monitors. Existing responsive behavior from dashboard framework applies.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /