Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 178

ezigus edited this page Mar 17, 2026 · 4 revisions

Now I have a complete picture. Here's the implementation plan:


Implementation Plan: Pipeline Cost Forecast and Budget Gate

Brainstorming / Design Decisions

Alternatives Considered

Approach A: Simple multiplier (stage_count ×ばつ flat_rate)

  • Pros: Trivial to implement, no historical data needed
  • Cons: Inaccurate — ignores model tiers, stage duration variance, complexity
  • Verdict: Too simplistic for useful go/no-go decisions

Approach B: Per-stage forecast using template model assignments + historical durations (CHOSEN)

  • Pros: Leverages existing per-stage model config from templates, uses real event history, provides confidence intervals, minimal new infrastructure
  • Cons: Requires parsing events.jsonl for history; cold-start needs defaults
  • Verdict: Best accuracy/complexity tradeoff. Builds on existing estimate_pipeline_cost() pattern but makes it per-stage aware

Approach C: ML regression model trained on historical runs

  • Pros: Most accurate long-term
  • Cons: Massive over-engineering for current data volume; requires training pipeline
  • Verdict: Future enhancement when data volume justifies it

Minimum Viable Change

  • cost_forecast() in sw-cost.sh → per-stage JSON output with confidence
  • Budget gate in pipeline_start() → block or warn before stages run
  • --force-start override flag
  • Variance event emission at pipeline completion
  • shipwright cost forecast CLI command
  • Dashboard: show forecast on queued items

Risk Assessment

  • Breaking existing pipelines: Low risk — forecast is advisory pre-start; budget gate respects --ignore-budget and new --force-start
  • Cold start (no history): Handled by default stage durations + "low" confidence
  • Bash 3.2 compat: Must avoid associative arrays; use jq for all JSON manipulation
  • Performance: Scanning events.jsonl could be slow with large files → use tail -1000 + grep filter

Files to Modify

File Action Purpose
scripts/sw-cost.sh Modify Add cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand
scripts/sw-pipeline.sh Modify Add --force-start flag, hook forecast + budget gate into pipeline_start(), emit variance at completion
config/event-schema.json Modify Add cost.forecast and cost.forecast_variance event types
dashboard/src/types/api.ts Modify Add CostForecast interface, extend QueueItem with forecast fields
dashboard/server.ts Modify Add /api/costs/forecast endpoint, include forecast in queue data
dashboard/src/views/pipelines.ts Modify Display forecast for queued pipelines
src/cost-forecast.test.js Create Unit tests for forecast logic
scripts/sw-pipeline-test.sh Modify Add forecast + budget gate integration tests

Implementation Steps

Step 1: Add cost_forecast() engine to sw-cost.sh

Add after cost_remaining_budget() (~line 310):

  • Default stage durations JSON constant (seconds): intake=60, plan=300, design=300, build=1200, test=180, review=300, compound_quality=600, audit=120, pr=60, merge=60, deploy=120, validate=60, monitor=300

  • Token rate constants per stage type (tokens/second): build=50in/20out, review/compound_quality=40in/30out, test=10in/5out, default=20in/10out

  • cost_forecast(pipeline_config_path, complexity) function:

    1. Read template JSON → extract enabled stages with their model assignments
    2. Query historical durations from events.jsonl: grep "stage.completed" → group by stage → compute avg duration and count
    3. For each enabled stage: use historical avg duration (or default), apply complexity multiplier (complexity / 5.0), estimate tokens from duration ×ばつ token rates, calculate cost via cost_calculate()
    4. Sum total cost, determine confidence level (≥20 data points=high, 5-19=medium, <5=low)
    5. Output JSON: {total_usd, low_usd, high_usd, stages: [{id, model, est_duration_s, est_cost}], confidence, data_points, complexity_multiplier}
    6. Confidence interval: low = total ×ばつ 0.7, high = total ×ばつ 1.5 (medium); narrower for high confidence
  • cost_forecast_display(forecast_json) function: Pretty-print forecast table with per-stage breakdown, total, confidence level, and budget comparison

  • cost_record_variance(forecast_usd, actual_usd, confidence, template, issue) function: Emit cost.forecast_variance event with forecast, actual, variance USD, variance %, template, issue

  • CLI subcommand forecast: shipwright cost forecast [--pipeline standard] [--complexity 5] [--json]

Step 2: Add --force-start flag to sw-pipeline.sh

  • Add FORCE_START=false to defaults (~line 299)
  • Add --force-start) FORCE_START=true; shift ;; to argument parser (~line 460)
  • Add help text for --force-start

Step 3: Hook forecast + budget gate into pipeline_start()

After load_pipeline_config (~line 2438), before state file creation:

  1. Call cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}"
  2. Save forecast JSON to $ARTIFACTS_DIR/cost-forecast.json
  3. Display forecast via cost_forecast_display
  4. Get remaining budget via cost_remaining_budget
  5. Budget gate logic:
    • If budget is "unlimited" → skip gate
    • If forecast high_usd > remaining budget AND FORCE_START != true AND IGNORE_BUDGET != true → block with error message showing forecast vs budget, suggest --force-start
    • If forecast total_usd > 50% of remaining budget → warn (don't block)
  6. Emit cost.forecast event with forecast_usd, confidence, template, issue
  7. Store PIPELINE_FORECAST_USD for variance tracking at completion

Step 4: Emit variance at pipeline completion

At pipeline completion (~line 2700 for success, ~line 2752 for failure):

  1. If PIPELINE_FORECAST_USD is set, call cost_record_variance "$PIPELINE_FORECAST_USD" "$total_cost" "$FORECAST_CONFIDENCE" "$PIPELINE_NAME" "${ISSUE_NUMBER:-0}"

Step 5: Update event schema

Add to config/event-schema.json:

  • cost.forecast: fields = forecast_usd, low_usd, high_usd, confidence, template, issue, complexity, data_points
  • cost.forecast_variance: fields = forecast_usd, actual_usd, variance_usd, variance_pct, confidence, template, issue

Step 6: Dashboard — API endpoint

In dashboard/server.ts, add /api/costs/forecast endpoint:

  • Accept query param ?pipeline=standard&complexity=5
  • Shell out to shipwright cost forecast --pipeline X --complexity Y --json
  • Return JSON response

Extend the /api/state queue items to include forecast data when available (read from pipeline artifacts).

Step 7: Dashboard — types and UI

In dashboard/src/types/api.ts:

  • Add CostForecast interface: {total_usd, low_usd, high_usd, confidence, stages, data_points}
  • Extend QueueItem with forecast?: CostForecast

In dashboard/src/views/pipelines.ts:

  • When rendering queued items, show forecast if available: "Est: 45ドル-60ドル (medium confidence)"

Step 8: Tests

Unit tests (src/cost-forecast.test.js):

  • cost_forecast with mock template and no history → returns defaults with low confidence
  • cost_forecast with mock history → returns historical averages with appropriate confidence
  • cost_forecast with complexity multiplier → scales durations correctly
  • cost_record_variance → emits correct event
  • Budget gate logic: blocks when over budget, warns at 50-100%, passes when under

Integration tests (add to scripts/sw-pipeline-test.sh):

  • Pipeline start with forecast display
  • Pipeline blocked by budget gate → verify exit code + message
  • Pipeline with --force-start overrides gate
  • Variance event emitted after completion

Task Checklist

  • Task 1: Add default stage durations and token rate constants to sw-cost.sh
  • Task 2: Implement cost_forecast() function with historical data lookup and confidence levels
  • Task 3: Implement cost_forecast_display() CLI table renderer
  • Task 4: Implement cost_record_variance() function
  • Task 5: Add forecast CLI subcommand to sw-cost.sh router
  • Task 6: Add --force-start flag and FORCE_START variable to sw-pipeline.sh
  • Task 7: Hook forecast generation + budget gate into pipeline_start() before stage execution
  • Task 8: Hook variance tracking into pipeline completion (success + failure paths)
  • Task 9: Update config/event-schema.json with new event types
  • Task 10: Add CostForecast type and /api/costs/forecast endpoint to dashboard
  • Task 11: Update dashboard pipelines view to display forecast for queued items
  • Task 12: Write unit tests for forecast engine and variance tracking
  • Task 13: Add integration tests to sw-pipeline-test.sh for budget gate behavior
  • Task 14: Run full test suite and fix any regressions

Testing Approach

  1. Unit tests: Mock events.jsonl with known data, verify forecast calculations match expected values. Test confidence thresholds at boundaries (4, 5, 19, 20 data points). Test cold-start defaults.
  2. Integration tests: Use mock binaries from existing test harness. Create temp budget.json with low limit, verify pipeline start is blocked. Verify --force-start override. Verify variance event in events.jsonl after completion.
  3. Dashboard tests: Verify /api/costs/forecast returns valid JSON. Verify queue rendering includes forecast display.
  4. Manual verification: shipwright cost forecast --pipeline cost-aware --json produces valid output. shipwright pipeline start --issue N --dry-run shows forecast.

Definition of Done

  • shipwright cost forecast CLI command works with all template types
  • Forecast displayed before pipeline start (both interactive and headless)
  • Pipeline blocked when forecast exceeds remaining budget (with clear error message)
  • --force-start override bypasses budget gate with acknowledgment
  • --ignore-budget also bypasses forecast gate (backward compatible)
  • cost.forecast event emitted to events.jsonl at pipeline start
  • cost.forecast_variance event emitted at pipeline completion
  • Confidence intervals: low (<5 runs), medium (5-19), high (≥20)
  • Dashboard shows forecast for queued pipelines
  • All existing tests pass (npm test)
  • New tests cover forecast calculation, budget gate, variance tracking
  • Bash 3.2 compatible (no associative arrays, no bash 4+ features)

User Stories

Primary: As a pipeline operator, I want to see estimated cost before a pipeline starts, so that I can make an informed go/no-go decision and avoid surprise budget overruns.

Secondary: As a team lead monitoring costs, I want the system to automatically block pipelines that would exceed our daily budget, so that runaway spending is prevented without manual oversight.

Edge Cases

  1. No historical data (cold start): Uses conservative defaults with "low" confidence label — user knows estimate is rough
  2. Budget not configured: Forecast still displayed but gate is skipped (matches existing cost_remaining_budget behavior)
  3. Forecast far exceeds budget but --force-start used: Pipeline proceeds with warning logged and event emitted for audit
  4. Template with all stages disabled: Forecast returns 0ドル.00 with note
  5. Model pricing changed mid-day: Forecast uses current pricing; variance tracking captures the drift

Endpoint Specification

GET /api/costs/forecast?pipeline=standard&complexity=5

  • Response 200: {total_usd, low_usd, high_usd, confidence, stages: [{id, model, est_duration_s, est_cost}], data_points}
  • Response 400: {error: {code: "invalid_template", message: "Unknown pipeline template: foo"}}
  • No auth required (local dashboard)
  • No rate limiting (local-only)
  • No versioning needed (internal API)

This plan adds ~250 lines to sw-cost.sh, ~50 lines to sw-pipeline.sh, and ~100 lines of tests. The blast radius is contained to cost infrastructure and the pipeline start path — no existing stage execution logic is modified.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /