-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 178
Now I have a complete picture. Here's the implementation plan:
Approach A: Simple multiplier (stage_count ×ばつ flat_rate)
- Pros: Trivial to implement, no historical data needed
- Cons: Inaccurate — ignores model tiers, stage duration variance, complexity
- Verdict: Too simplistic for useful go/no-go decisions
Approach B: Per-stage forecast using template model assignments + historical durations (CHOSEN)
- Pros: Leverages existing per-stage model config from templates, uses real event history, provides confidence intervals, minimal new infrastructure
- Cons: Requires parsing events.jsonl for history; cold-start needs defaults
- Verdict: Best accuracy/complexity tradeoff. Builds on existing
estimate_pipeline_cost()pattern but makes it per-stage aware
Approach C: ML regression model trained on historical runs
- Pros: Most accurate long-term
- Cons: Massive over-engineering for current data volume; requires training pipeline
- Verdict: Future enhancement when data volume justifies it
-
cost_forecast()in sw-cost.sh → per-stage JSON output with confidence - Budget gate in
pipeline_start()→ block or warn before stages run -
--force-startoverride flag - Variance event emission at pipeline completion
-
shipwright cost forecastCLI command - Dashboard: show forecast on queued items
-
Breaking existing pipelines: Low risk — forecast is advisory pre-start; budget gate respects
--ignore-budgetand new--force-start - Cold start (no history): Handled by default stage durations + "low" confidence
- Bash 3.2 compat: Must avoid associative arrays; use jq for all JSON manipulation
-
Performance: Scanning events.jsonl could be slow with large files → use
tail -1000+ grep filter
| File | Action | Purpose |
|---|---|---|
scripts/sw-cost.sh |
Modify | Add cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand |
scripts/sw-pipeline.sh |
Modify | Add --force-start flag, hook forecast + budget gate into pipeline_start(), emit variance at completion |
config/event-schema.json |
Modify | Add cost.forecast and cost.forecast_variance event types |
dashboard/src/types/api.ts |
Modify | Add CostForecast interface, extend QueueItem with forecast fields |
dashboard/server.ts |
Modify | Add /api/costs/forecast endpoint, include forecast in queue data |
dashboard/src/views/pipelines.ts |
Modify | Display forecast for queued pipelines |
src/cost-forecast.test.js |
Create | Unit tests for forecast logic |
scripts/sw-pipeline-test.sh |
Modify | Add forecast + budget gate integration tests |
Add after cost_remaining_budget() (~line 310):
-
Default stage durations JSON constant (seconds): intake=60, plan=300, design=300, build=1200, test=180, review=300, compound_quality=600, audit=120, pr=60, merge=60, deploy=120, validate=60, monitor=300
-
Token rate constants per stage type (tokens/second): build=50in/20out, review/compound_quality=40in/30out, test=10in/5out, default=20in/10out
-
cost_forecast(pipeline_config_path, complexity)function:- Read template JSON → extract enabled stages with their model assignments
- Query historical durations from events.jsonl:
grep "stage.completed"→ group by stage → compute avg duration and count - For each enabled stage: use historical avg duration (or default), apply complexity multiplier (
complexity / 5.0), estimate tokens from duration ×ばつ token rates, calculate cost viacost_calculate() - Sum total cost, determine confidence level (≥20 data points=high, 5-19=medium, <5=low)
- Output JSON:
{total_usd, low_usd, high_usd, stages: [{id, model, est_duration_s, est_cost}], confidence, data_points, complexity_multiplier} - Confidence interval: low = total ×ばつ 0.7, high = total ×ばつ 1.5 (medium); narrower for high confidence
-
cost_forecast_display(forecast_json)function: Pretty-print forecast table with per-stage breakdown, total, confidence level, and budget comparison -
cost_record_variance(forecast_usd, actual_usd, confidence, template, issue)function: Emitcost.forecast_varianceevent with forecast, actual, variance USD, variance %, template, issue -
CLI subcommand
forecast:shipwright cost forecast [--pipeline standard] [--complexity 5] [--json]
- Add
FORCE_START=falseto defaults (~line 299) - Add
--force-start) FORCE_START=true; shift ;;to argument parser (~line 460) - Add help text for
--force-start
After load_pipeline_config (~line 2438), before state file creation:
- Call
cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}" - Save forecast JSON to
$ARTIFACTS_DIR/cost-forecast.json - Display forecast via
cost_forecast_display - Get remaining budget via
cost_remaining_budget -
Budget gate logic:
- If budget is "unlimited" → skip gate
- If forecast
high_usd> remaining budget ANDFORCE_START!= true ANDIGNORE_BUDGET!= true → block with error message showing forecast vs budget, suggest--force-start - If forecast
total_usd> 50% of remaining budget → warn (don't block)
- Emit
cost.forecastevent with forecast_usd, confidence, template, issue - Store
PIPELINE_FORECAST_USDfor variance tracking at completion
At pipeline completion (~line 2700 for success, ~line 2752 for failure):
- If
PIPELINE_FORECAST_USDis set, callcost_record_variance "$PIPELINE_FORECAST_USD" "$total_cost" "$FORECAST_CONFIDENCE" "$PIPELINE_NAME" "${ISSUE_NUMBER:-0}"
Add to config/event-schema.json:
-
cost.forecast: fields = forecast_usd, low_usd, high_usd, confidence, template, issue, complexity, data_points -
cost.forecast_variance: fields = forecast_usd, actual_usd, variance_usd, variance_pct, confidence, template, issue
In dashboard/server.ts, add /api/costs/forecast endpoint:
- Accept query param
?pipeline=standard&complexity=5 - Shell out to
shipwright cost forecast --pipeline X --complexity Y --json - Return JSON response
Extend the /api/state queue items to include forecast data when available (read from pipeline artifacts).
In dashboard/src/types/api.ts:
- Add
CostForecastinterface:{total_usd, low_usd, high_usd, confidence, stages, data_points} - Extend
QueueItemwithforecast?: CostForecast
In dashboard/src/views/pipelines.ts:
- When rendering queued items, show forecast if available: "Est: 45ドル-60ドル (medium confidence)"
Unit tests (src/cost-forecast.test.js):
-
cost_forecastwith mock template and no history → returns defaults with low confidence -
cost_forecastwith mock history → returns historical averages with appropriate confidence -
cost_forecastwith complexity multiplier → scales durations correctly -
cost_record_variance→ emits correct event - Budget gate logic: blocks when over budget, warns at 50-100%, passes when under
Integration tests (add to scripts/sw-pipeline-test.sh):
- Pipeline start with forecast display
- Pipeline blocked by budget gate → verify exit code + message
- Pipeline with
--force-startoverrides gate - Variance event emitted after completion
- Task 1: Add default stage durations and token rate constants to sw-cost.sh
- Task 2: Implement
cost_forecast()function with historical data lookup and confidence levels - Task 3: Implement
cost_forecast_display()CLI table renderer - Task 4: Implement
cost_record_variance()function - Task 5: Add
forecastCLI subcommand to sw-cost.sh router - Task 6: Add
--force-startflag andFORCE_STARTvariable to sw-pipeline.sh - Task 7: Hook forecast generation + budget gate into
pipeline_start()before stage execution - Task 8: Hook variance tracking into pipeline completion (success + failure paths)
- Task 9: Update
config/event-schema.jsonwith new event types - Task 10: Add
CostForecasttype and/api/costs/forecastendpoint to dashboard - Task 11: Update dashboard pipelines view to display forecast for queued items
- Task 12: Write unit tests for forecast engine and variance tracking
- Task 13: Add integration tests to sw-pipeline-test.sh for budget gate behavior
- Task 14: Run full test suite and fix any regressions
- Unit tests: Mock events.jsonl with known data, verify forecast calculations match expected values. Test confidence thresholds at boundaries (4, 5, 19, 20 data points). Test cold-start defaults.
-
Integration tests: Use mock binaries from existing test harness. Create temp budget.json with low limit, verify pipeline start is blocked. Verify
--force-startoverride. Verify variance event in events.jsonl after completion. -
Dashboard tests: Verify
/api/costs/forecastreturns valid JSON. Verify queue rendering includes forecast display. -
Manual verification:
shipwright cost forecast --pipeline cost-aware --jsonproduces valid output.shipwright pipeline start --issue N --dry-runshows forecast.
-
shipwright cost forecastCLI command works with all template types - Forecast displayed before pipeline start (both interactive and headless)
- Pipeline blocked when forecast exceeds remaining budget (with clear error message)
-
--force-startoverride bypasses budget gate with acknowledgment -
--ignore-budgetalso bypasses forecast gate (backward compatible) -
cost.forecastevent emitted to events.jsonl at pipeline start -
cost.forecast_varianceevent emitted at pipeline completion - Confidence intervals: low (<5 runs), medium (5-19), high (≥20)
- Dashboard shows forecast for queued pipelines
- All existing tests pass (
npm test) - New tests cover forecast calculation, budget gate, variance tracking
- Bash 3.2 compatible (no associative arrays, no bash 4+ features)
Primary: As a pipeline operator, I want to see estimated cost before a pipeline starts, so that I can make an informed go/no-go decision and avoid surprise budget overruns.
Secondary: As a team lead monitoring costs, I want the system to automatically block pipelines that would exceed our daily budget, so that runaway spending is prevented without manual oversight.
- No historical data (cold start): Uses conservative defaults with "low" confidence label — user knows estimate is rough
-
Budget not configured: Forecast still displayed but gate is skipped (matches existing
cost_remaining_budgetbehavior) -
Forecast far exceeds budget but
--force-startused: Pipeline proceeds with warning logged and event emitted for audit - Template with all stages disabled: Forecast returns 0ドル.00 with note
- Model pricing changed mid-day: Forecast uses current pricing; variance tracking captures the drift
GET /api/costs/forecast?pipeline=standard&complexity=5
- Response 200:
{total_usd, low_usd, high_usd, confidence, stages: [{id, model, est_duration_s, est_cost}], data_points} - Response 400:
{error: {code: "invalid_template", message: "Unknown pipeline template: foo"}} - No auth required (local dashboard)
- No rate limiting (local-only)
- No versioning needed (internal API)
This plan adds ~250 lines to sw-cost.sh, ~50 lines to sw-pipeline.sh, and ~100 lines of tests. The blast radius is contained to cost infrastructure and the pipeline start path — no existing stage execution logic is modified.