Pipeline Plan 178

Jump to bottom

ezigus edited this page Mar 17, 2026 · 4 revisions

Now I have a complete picture. Here's the implementation plan:

Implementation Plan: Pipeline Cost Forecast and Budget Gate

Brainstorming / Design Decisions

Alternatives Considered

Approach A: Simple multiplier (stage_count ×ばつ flat_rate)

Pros: Trivial to implement, no historical data needed
Cons: Inaccurate — ignores model tiers, stage duration variance, complexity
Verdict: Too simplistic for useful go/no-go decisions

Approach B: Per-stage forecast using template model assignments + historical durations (CHOSEN)

Pros: Leverages existing per-stage model config from templates, uses real event history, provides confidence intervals, minimal new infrastructure
Cons: Requires parsing events.jsonl for history; cold-start needs defaults
Verdict: Best accuracy/complexity tradeoff. Builds on existing estimate_pipeline_cost() pattern but makes it per-stage aware

Approach C: ML regression model trained on historical runs

Pros: Most accurate long-term
Cons: Massive over-engineering for current data volume; requires training pipeline
Verdict: Future enhancement when data volume justifies it

Minimum Viable Change

cost_forecast() in sw-cost.sh → per-stage JSON output with confidence
Budget gate in pipeline_start() → block or warn before stages run
--force-start override flag
Variance event emission at pipeline completion
shipwright cost forecast CLI command
Dashboard: show forecast on queued items

Risk Assessment

Breaking existing pipelines: Low risk — forecast is advisory pre-start; budget gate respects --ignore-budget and new --force-start
Cold start (no history): Handled by default stage durations + "low" confidence
Bash 3.2 compat: Must avoid associative arrays; use jq for all JSON manipulation
Performance: Scanning events.jsonl could be slow with large files → use tail -1000 + grep filter

Files to Modify

File	Action	Purpose
`scripts/sw-cost.sh`	Modify	Add `cost_forecast()`, `cost_forecast_display()`, `cost_record_variance()`, `forecast` CLI subcommand
`scripts/sw-pipeline.sh`	Modify	Add `--force-start` flag, hook forecast + budget gate into `pipeline_start()`, emit variance at completion
`config/event-schema.json`	Modify	Add `cost.forecast` and `cost.forecast_variance` event types
`dashboard/src/types/api.ts`	Modify	Add `CostForecast` interface, extend `QueueItem` with forecast fields
`dashboard/server.ts`	Modify	Add `/api/costs/forecast` endpoint, include forecast in queue data
`dashboard/src/views/pipelines.ts`	Modify	Display forecast for queued pipelines
`src/cost-forecast.test.js`	Create	Unit tests for forecast logic
`scripts/sw-pipeline-test.sh`	Modify	Add forecast + budget gate integration tests

Implementation Steps

Step 1: Add `cost_forecast()` engine to sw-cost.sh

Add after cost_remaining_budget() (~line 310):

Default stage durations JSON constant (seconds): intake=60, plan=300, design=300, build=1200, test=180, review=300, compound_quality=600, audit=120, pr=60, merge=60, deploy=120, validate=60, monitor=300
Token rate constants per stage type (tokens/second): build=50in/20out, review/compound_quality=40in/30out, test=10in/5out, default=20in/10out
cost_forecast(pipeline_config_path, complexity) function:
1. Read template JSON → extract enabled stages with their model assignments
2. Query historical durations from events.jsonl: grep "stage.completed" → group by stage → compute avg duration and count
3. For each enabled stage: use historical avg duration (or default), apply complexity multiplier (complexity / 5.0), estimate tokens from duration ×ばつ token rates, calculate cost via cost_calculate()
4. Sum total cost, determine confidence level (≥20 data points=high, 5-19=medium, <5=low)
5. Output JSON: {total_usd, low_usd, high_usd, stages: [{id, model, est_duration_s, est_cost}], confidence, data_points, complexity_multiplier}
6. Confidence interval: low = total ×ばつ 0.7, high = total ×ばつ 1.5 (medium); narrower for high confidence
cost_forecast_display(forecast_json) function: Pretty-print forecast table with per-stage breakdown, total, confidence level, and budget comparison
cost_record_variance(forecast_usd, actual_usd, confidence, template, issue) function: Emit cost.forecast_variance event with forecast, actual, variance USD, variance %, template, issue
CLI subcommand forecast: shipwright cost forecast [--pipeline standard] [--complexity 5] [--json]

Step 2: Add `--force-start` flag to sw-pipeline.sh

Add FORCE_START=false to defaults (~line 299)
Add --force-start) FORCE_START=true; shift ;; to argument parser (~line 460)
Add help text for --force-start

Step 3: Hook forecast + budget gate into `pipeline_start()`

After load_pipeline_config (~line 2438), before state file creation:

Call cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}"
Save forecast JSON to $ARTIFACTS_DIR/cost-forecast.json
Display forecast via cost_forecast_display
Get remaining budget via cost_remaining_budget
Budget gate logic:
- If budget is "unlimited" → skip gate
- If forecast high_usd > remaining budget AND FORCE_START != true AND IGNORE_BUDGET != true → block with error message showing forecast vs budget, suggest --force-start
- If forecast total_usd > 50% of remaining budget → warn (don't block)
Emit cost.forecast event with forecast_usd, confidence, template, issue
Store PIPELINE_FORECAST_USD for variance tracking at completion

Step 4: Emit variance at pipeline completion

At pipeline completion (~line 2700 for success, ~line 2752 for failure):

If PIPELINE_FORECAST_USD is set, call cost_record_variance "$PIPELINE_FORECAST_USD" "$total_cost" "$FORECAST_CONFIDENCE" "$PIPELINE_NAME" "${ISSUE_NUMBER:-0}"

Step 5: Update event schema

Add to config/event-schema.json:

cost.forecast: fields = forecast_usd, low_usd, high_usd, confidence, template, issue, complexity, data_points
cost.forecast_variance: fields = forecast_usd, actual_usd, variance_usd, variance_pct, confidence, template, issue

Step 6: Dashboard — API endpoint

In dashboard/server.ts, add /api/costs/forecast endpoint:

Accept query param ?pipeline=standard&complexity=5
Shell out to shipwright cost forecast --pipeline X --complexity Y --json
Return JSON response

Extend the /api/state queue items to include forecast data when available (read from pipeline artifacts).

Step 7: Dashboard — types and UI

In dashboard/src/types/api.ts:

Add CostForecast interface: {total_usd, low_usd, high_usd, confidence, stages, data_points}
Extend QueueItem with forecast?: CostForecast

In dashboard/src/views/pipelines.ts:

When rendering queued items, show forecast if available: "Est: 45ドル-60ドル (medium confidence)"

Step 8: Tests

Unit tests (src/cost-forecast.test.js):

cost_forecast with mock template and no history → returns defaults with low confidence
cost_forecast with mock history → returns historical averages with appropriate confidence
cost_forecast with complexity multiplier → scales durations correctly
cost_record_variance → emits correct event
Budget gate logic: blocks when over budget, warns at 50-100%, passes when under

Integration tests (add to scripts/sw-pipeline-test.sh):

Pipeline start with forecast display
Pipeline blocked by budget gate → verify exit code + message
Pipeline with --force-start overrides gate
Variance event emitted after completion

Task Checklist

Task 1: Add default stage durations and token rate constants to sw-cost.sh
Task 2: Implement cost_forecast() function with historical data lookup and confidence levels
Task 3: Implement cost_forecast_display() CLI table renderer
Task 4: Implement cost_record_variance() function
Task 5: Add forecast CLI subcommand to sw-cost.sh router
Task 6: Add --force-start flag and FORCE_START variable to sw-pipeline.sh
Task 7: Hook forecast generation + budget gate into pipeline_start() before stage execution
Task 8: Hook variance tracking into pipeline completion (success + failure paths)
Task 9: Update config/event-schema.json with new event types
Task 10: Add CostForecast type and /api/costs/forecast endpoint to dashboard
Task 11: Update dashboard pipelines view to display forecast for queued items
Task 12: Write unit tests for forecast engine and variance tracking
Task 13: Add integration tests to sw-pipeline-test.sh for budget gate behavior
Task 14: Run full test suite and fix any regressions

Testing Approach

Unit tests: Mock events.jsonl with known data, verify forecast calculations match expected values. Test confidence thresholds at boundaries (4, 5, 19, 20 data points). Test cold-start defaults.
Integration tests: Use mock binaries from existing test harness. Create temp budget.json with low limit, verify pipeline start is blocked. Verify --force-start override. Verify variance event in events.jsonl after completion.
Dashboard tests: Verify /api/costs/forecast returns valid JSON. Verify queue rendering includes forecast display.
Manual verification: shipwright cost forecast --pipeline cost-aware --json produces valid output. shipwright pipeline start --issue N --dry-run shows forecast.

Definition of Done

shipwright cost forecast CLI command works with all template types
Forecast displayed before pipeline start (both interactive and headless)
Pipeline blocked when forecast exceeds remaining budget (with clear error message)
--force-start override bypasses budget gate with acknowledgment
--ignore-budget also bypasses forecast gate (backward compatible)
cost.forecast event emitted to events.jsonl at pipeline start
cost.forecast_variance event emitted at pipeline completion
Confidence intervals: low (<5 runs), medium (5-19), high (≥20)
Dashboard shows forecast for queued pipelines
All existing tests pass (npm test)
New tests cover forecast calculation, budget gate, variance tracking
Bash 3.2 compatible (no associative arrays, no bash 4+ features)

User Stories

Primary: As a pipeline operator, I want to see estimated cost before a pipeline starts, so that I can make an informed go/no-go decision and avoid surprise budget overruns.

Secondary: As a team lead monitoring costs, I want the system to automatically block pipelines that would exceed our daily budget, so that runaway spending is prevented without manual oversight.

Edge Cases

No historical data (cold start): Uses conservative defaults with "low" confidence label — user knows estimate is rough
Budget not configured: Forecast still displayed but gate is skipped (matches existing cost_remaining_budget behavior)
Forecast far exceeds budget but --force-start used: Pipeline proceeds with warning logged and event emitted for audit
Template with all stages disabled: Forecast returns 0ドル.00 with note
Model pricing changed mid-day: Forecast uses current pricing; variance tracking captures the drift

Endpoint Specification

GET /api/costs/forecast?pipeline=standard&complexity=5

Response 200: {total_usd, low_usd, high_usd, confidence, stages: [{id, model, est_duration_s, est_cost}], data_points}
Response 400: {error: {code: "invalid_template", message: "Unknown pipeline template: foo"}}
No auth required (local dashboard)
No rate limiting (local-only)
No versioning needed (internal API)

This plan adds ~250 lines to sw-cost.sh, ~50 lines to sw-pipeline.sh, and ~100 lines of tests. The blast radius is contained to cost infrastructure and the pipeline start path — no existing stage execution logic is modified.

Pipeline Plan 178

Implementation Plan: Pipeline Cost Forecast and Budget Gate

Brainstorming / Design Decisions

Alternatives Considered

Minimum Viable Change

Risk Assessment

Files to Modify

Implementation Steps

Step 1: Add cost_forecast() engine to sw-cost.sh

Step 2: Add --force-start flag to sw-pipeline.sh

Step 3: Hook forecast + budget gate into pipeline_start()

Step 4: Emit variance at pipeline completion

Step 5: Update event schema

Step 6: Dashboard — API endpoint

Step 7: Dashboard — types and UI

Step 8: Tests

Task Checklist

Testing Approach

Definition of Done

User Stories

Edge Cases

Endpoint Specification

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Step 1: Add `cost_forecast()` engine to sw-cost.sh

Step 2: Add `--force-start` flag to sw-pipeline.sh

Step 3: Hook forecast + budget gate into `pipeline_start()`