-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 154
- Monitor accumulated
LOOP_INPUT_TOKENS+LOOP_OUTPUT_TOKENSper iteration against a configurable threshold (70% of model context window) - When threshold is crossed, generate a compressed state summary and trigger a session restart with that summary injected
- Pros: Uses existing token tracking infrastructure, leverages existing session restart mechanism, minimal new code
- Blast radius: Small — adds a new check in the main loop, a new summarization function, and a new module file
- Trade-offs: Token counts from Claude CLI are per-iteration (not cumulative conversation context), so we estimate cumulative context usage
- Only track prompt character growth across iterations
- Pros: Simpler, no new dependency on token math
- Cons: Doesn't account for Claude's internal conversation context accumulation; prompt size alone doesn't reflect total context usage
- Rejected: Insufficient — the real exhaustion happens in Claude's conversation context, not just our injected prompt
- Parse Claude CLI stderr for context limit warnings
- Pros: Most accurate
- Cons: Fragile (depends on CLI output format), reactive not proactive
- Rejected: We want proactive prevention, not reactive recovery
Approach A — Track cumulative token usage across iterations and proactively trigger summarization + session restart at 70% of the context window. This builds on:
- Existing
accumulate_loop_tokens()insw-loop.sh(already tracks per-iteration tokens) - Existing
run_loop_with_restarts()session restart mechanism - Existing
emit_eventtelemetry system
The key insight: each Claude CLI invocation gets the full conversation context. As iterations accumulate, the total tokens grow. We track cumulative input tokens as a proxy for context window usage and trigger preemptive summarization before hitting limits.
| File | Action | Purpose |
|---|---|---|
scripts/lib/loop-context-monitor.sh |
CREATE | New module: context threshold monitoring + state summarization |
scripts/sw-loop.sh |
MODIFY | Source new module, add context check in main loop, wire summarization into restart |
scripts/lib/loop-iteration.sh |
MODIFY | Add cumulative token tracking variable, emit enhanced context metrics |
scripts/sw-loop-test.sh |
MODIFY | Add test cases for context monitoring and summarization |
New module with these functions:
# Module guard _LOOP_CONTEXT_MONITOR_LOADED=1 # Constants CONTEXT_WINDOW_TOKENS=${CONTEXT_WINDOW_TOKENS:-200000} # Default: 200k (Opus/Sonnet) CONTEXT_EXHAUSTION_THRESHOLD=${CONTEXT_EXHAUSTION_THRESHOLD:-70} # Trigger at 70% CONTEXT_SUMMARIZATION_TRIGGERED=false # check_context_exhaustion() # - Computes cumulative_tokens = LOOP_INPUT_TOKENS + LOOP_OUTPUT_TOKENS # - Computes usage_pct = cumulative_tokens / CONTEXT_WINDOW_TOKENS * 100 # - If usage_pct >= CONTEXT_EXHAUSTION_THRESHOLD: # - Emits loop.context_exhaustion_warning event # - Returns 0 (threshold crossed) # - Else returns 1 (safe) # summarize_loop_state() # - Writes compressed state to $LOG_DIR/context-summary.md: # - Goal (original, not accumulated) # - Iteration count and test status # - Files modified (from git diff --name-only LOOP_START_COMMIT..HEAD) # - Last error summary (from error-summary.json) # - Key fixes attempted (from log entries, last 5) # - Test results status # - Returns path to summary file # get_context_usage_pct() # - Returns current context usage as integer percentage # - Used by telemetry and logging
- Source the new module near the top (after other lib sources)
- After
accumulate_loop_tokens()call in the main loop (~line 2166), add:# Context exhaustion prevention if check_context_exhaustion; then warn "Context usage at $(get_context_usage_pct)% — triggering proactive summarization" summarize_loop_state STATUS="context_exhaustion" write_state write_progress break # Exit to restart wrapper fi
- In
run_loop_with_restarts(), handleSTATUS="context_exhaustion"as a restart-worthy condition (alongside "stuck_restart") - When restarting after context exhaustion, inject the summary from
context-summary.mdinto the restart context
Add to run_claude_iteration() after the accumulate_loop_tokens call:
- Emit
loop.context_usageevent with:cumulative_input,cumulative_output,usage_pct,threshold
In summarize_loop_state():
- Read
ORIGINAL_GOAL(already preserved in sw-loop.sh) - Read git diff stat from
LOOP_START_COMMIT - Extract error patterns from
error-summary.json - Extract last 5 log entries from
LOG_ENTRIES - Write to
$LOG_DIR/context-summary.mdin structured format - The restart mechanism already copies
error-summary.jsonand readsprogress.md— we addcontext-summary.mdto the restart context injection
In run_loop_with_restarts() (~line 2389):
- Add
context_exhaustionto the list of restartable statuses - When restarting after context_exhaustion:
- Inject context-summary.md content into GOAL as "## Previous Session Context (Summarized)"
- Reset token counters
- Emit
loop.context_exhaustion_restartevent
Add to sw-loop-test.sh:
-
Unit test: threshold calculation — Set LOOP_INPUT_TOKENS/OUTPUT_TOKENS to known values, verify
check_context_exhaustionreturns correctly at <70%, =70%, >70% -
Unit test: summarization output — Create mock state (git, error-summary.json, log entries), run
summarize_loop_state, verify output contains essential fields - Unit test: context window sizing — Verify CONTEXT_WINDOW_TOKENS defaults and override via env
-
Integration test: restart trigger — Simulate tokens exceeding threshold, verify loop breaks with
context_exhaustionstatus and emits correct event
- Task 1: Create
scripts/lib/loop-context-monitor.shwith module guard, constants,check_context_exhaustion(),summarize_loop_state(),get_context_usage_pct() - Task 2: Source the new module in
sw-loop.sh(near line 28 with other lib sources) - Task 3: Add context exhaustion check in main loop after
accumulate_loop_tokenscall (~line 2166 inrun_single_agent_loop) - Task 4: Handle
context_exhaustionstatus inrun_loop_with_restarts()— allow restart with summary injection - Task 5: Add
loop.context_exhaustion_warningandloop.context_exhaustion_restartevent emissions - Task 6: Emit
loop.context_usageevent per iteration with cumulative token usage percentage - Task 7: Add threshold calculation unit tests to
sw-loop-test.sh - Task 8: Add summarization output unit tests to
sw-loop-test.sh - Task 9: Add restart trigger integration test to
sw-loop-test.sh - Task 10: Verify existing tests still pass after changes
- Unit tests (7): Threshold math at boundary values (<70%, =70%, >70%), summarization output validation (4 field checks), context window default/override
- Integration tests (2): Full loop restart on context exhaustion, event emission verification
-
E2E tests (1): Existing
sw-loop-test.shregression (no breakage)
- 100% branch coverage on
check_context_exhaustion()(3 branches: under/at/over threshold) - 100% coverage on
summarize_loop_state()output fields - Existing test suite remains green
- Happy path: Loop runs under threshold, no summarization triggered
- Error case 1: Tokens exceed 70% threshold mid-loop — summarization fires, loop breaks gracefully
- Error case 2: Tokens exceed threshold on first iteration (huge prompt) — handled without crash
- Edge case 1: Zero tokens reported (jq unavailable) — no false positive trigger
- Edge case 2: CONTEXT_WINDOW_TOKENS set to 0 — division by zero protection
| Risk | Impact | Mitigation |
|---|---|---|
| Token counts are per-iteration, not cumulative conversation context | Underestimates true usage | Accumulate across iterations; use conservative 70% threshold |
| False positive triggers (threshold too aggressive) | Unnecessary restarts | Make threshold configurable via env/config; default 70% is conservative |
| Summary too lossy — critical context dropped | Regression after restart | Include: goal, files modified, error patterns, test status, last 5 log entries |
| Division by zero if CONTEXT_WINDOW_TOKENS=0 | Script crash | Guard with [[ "$window" -gt 0 ]] check |
-
check_context_exhaustion()correctly identifies when cumulative tokens exceed 70% of context window -
summarize_loop_state()produces compressed state with: goal, iteration count, modified files, error patterns, test status - Loop continues seamlessly after summarization-triggered restart without losing critical context
-
loop.context_exhaustion_warningevent emitted when threshold crossed (observable in events.jsonl) -
loop.context_exhaustion_restartevent emitted when restart occurs - Per-iteration
loop.context_usageevent includes cumulative token percentage - All new code has test coverage (threshold boundaries, summarization output, restart trigger)
- Existing test suite passes without regression
- Bash 3.2 compatible (no associative arrays, no
${var,,}) - Uses
set -euo pipefailand module guard pattern
- Spoofing/Tampering/Repudiation/Elevation: Not applicable — this is internal shell logic, no auth or external input
-
Information Disclosure:
context-summary.mdcontains goal/error text — same sensitivity as existingprogress.md. No secrets involved. - Denial of Service: False positive summarization could cause unnecessary restarts. Mitigated by configurable threshold and conservative default.
Not applicable — no authentication involved in this feature.
-
CONTEXT_WINDOW_TOKENSfrom env — validated as integer >0 -
CONTEXT_EXHAUSTION_THRESHOLDfrom env — validated as integer 1-99 - Token values from
accumulate_loop_tokens— already validated in existing code
- No secrets in code
- No external input from users (internal orchestration only)
- No network calls added
- No file path injection risk (all paths derived from existing LOG_DIR/PROJECT_ROOT)
-
loop.context_exhaustion_warningevent fires when expected (threshold crossed) - Loop does not crash or hang when summarization triggers
-
loop.context_usageevents show monotonically increasing token percentages - Restart after context exhaustion produces working sessions (not stuck loops)
-
context_exhaustion_warningfiring on iteration 1 = prompt too large or threshold misconfigured - Multiple consecutive
context_exhaustion_restartevents = possible infinite restart loop (guarded by MAX_RESTARTS)
- Search for "context_exhaustion" in events.jsonl
- Verify context-summary.md written before each exhaustion restart
Not applicable — this is build infrastructure, not a deployed service.
-
Token accumulation undercount — Claude CLI may not report all tokens. Likelihood: medium. Evidence: compare
loop-tokens.jsontotals vs Claude dashboard. - Threshold too aggressive — 70% may trigger too early for small tasks. Likelihood: low. Evidence: check if context_exhaustion events fire on simple 2-iteration loops.
- Summary injection bloats restart context — If summary is too large, it defeats the purpose. Likelihood: low. Mitigation: cap summary at 2000 chars.
- Token accumulation values across iterations (from existing
loop.context_efficiencyevents) - Actual Claude context window sizes for different models
This is a new feature, not a retry of a failed approach. Building on proven patterns: module guard, emit_event, session restart.
- Run
sw-loop-test.sh— all tests pass including new ones - Run
npm test— full suite green - Manual verification: set CONTEXT_WINDOW_TOKENS=1000, run loop, confirm early summarization trigger