Install
[dependencies]
llm-budget-window = "0.1"
crates.io: llm-budget-window
GitHub: MukundaKatta/llm-budget-window
Siblings
| Crate / Package |
What it does |
| token-budget-pool |
Thread-safe total session cap, not time-windowed; pair for belt-and-suspenders |
| token-budget-py |
Python port of token-budget-pool |
| llm-cost-cap |
Pre-flight cost gate: estimates cost before the call and rejects if over budget |
| llm-circuit-breaker |
Open/closed/half-open circuit breaker; stops calling a provider that keeps failing |
| claude-cost |
Compute per-call USD cost for Anthropic API calls, including cache read rates |
What is next
The most requested feature is a Backpressure mode that does not reject the call but instead blocks (or returns a future that resolves after the reset time) until the window clears. Right now, record returns Err immediately when a window is exceeded. The caller decides whether to sleep, abort, or queue. A built-in backpressure mode with an async record_or_wait function would make the common "wait and retry" pattern require less boilerplate.
Separate per-model windows are also on the list. A single budget window covers all models combined. If you are running multiple models simultaneously and want separate per-minute caps per model (to match per-model rate limits from the provider), you need multiple BudgetWindow instances today. A BudgetWindowMap that routes by model ID would make that cleaner.
Part of the Hermes Agent Challenge sprint. All crates shipped on crates.io.