Take Control of Your AI Agents
The runtime platform that makes AI agents debuggable, affordable, and reliable.
Existing tools show you what happened. Reins lets you control what happens next.
Quick Start • Features • Frameworks • Docs • Why Reins
pip install reins from reins import trace @trace(budget="0ドル.50", on_exceed="degrade") async def my_agent(task: str): response = await client.messages.create(model="claude-sonnet-4-20250514", ...) return response # When budget runs low → auto-switches to claude-haiku (not crash)
One decorator. Budget control + cost tracking + auto-degradation + trace recording.
Enterprise LLM API spending has surged from 1ドル.8B (2023 H2) to 8ドル.4B (2025 H1) — a 4.7x increase in 18 months. Menlo Ventures projects it will hit 15ドルB by 2026 if current velocity holds. Total enterprise GenAI investment reached 37ドルB in 2025, tripling from 11ドル.5B in 2024.
Enterprise LLM API Spending Growth
Yet the tools to govern this spending are shockingly primitive:
| Capability | Available Today? |
|---|---|
| Cost tracking (after the fact) | Widely available (Langfuse, LangSmith, Helicone) |
| Hard budget limits (block when exceeded) | Partial (LiteLLM, Portkey) |
| Auto-degradation (switch to cheaper model) | No existing tool |
| Circuit breakers (halt runaway loops) | No existing tool |
| Per-agent budget scoping | No existing tool |
- 47ドルK LangChain Loop (Nov 2025): Four agents in a research pipeline entered an infinite conversation loop for 11 days. The team assumed growing costs were "organic growth" — until the 47,000ドル bill arrived.
- 47ドルK Retry Storm (Feb 2026): A data enrichment agent misinterpreted an API error code, running 2.3 million API calls over a weekend. Only the external API's rate limiter slowed it down — not the team's own controls.
- Gartner (2025): Over 40% of agentic AI projects will be canceled by 2027 due to escalating costs, unclear value, or inadequate risk controls.
- Top SWE-bench Verified score: 79% — but real-world performance overestimates by up to 54%
- Agent success rates decline exponentially with task duration. Claude Sonnet's "half-life" is ~59 minutes
- A survey of 306 practitioners found reliability is the #1 barrier to enterprise agent adoption
| Industry | AI Spend (2025) | Growth | Key Concern |
|---|---|---|---|
| Healthcare | ~1ドル.5B in vertical AI | 3.3x YoY | Compliance + cost predictability |
| Financial Services | 23.7% of enterprise AI market | Steady | Risk controls + audit trails |
| Legal | 650ドルM market | Fast-growing | Per-case cost attribution |
| Customer Service | Largest agent deployment sector | Rapid | Per-conversation cost caps |
86% of enterprises plan to increase AI budgets in 2026 (Deloitte). The question isn't whether to spend — it's whether to spend blindly.
pip install reins
from reins import trace @trace(budget="0ドル.50", on_exceed="degrade") async def my_agent(task: str): client = anthropic.AsyncAnthropic() response = await client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": task}], ) return response.content[0].text
When the budget runs low, Reins automatically switches claude-sonnet to claude-haiku — your agent keeps running, just cheaper.
pip install reins[proxy] # Start local proxy with 5ドル/day budget reins proxy --port 8082 --budget '5ドル/day' --on-exceed degrade # Point Claude Code at it export ANTHROPIC_BASE_URL=http://localhost:8082 # Use Claude Code normally — Reins controls costs transparently claude "refactor this module"
# Cost report reins report # Trace visualization reins trace list reins trace show <run_id> # Context health analysis reins health <run_id> # Step-by-step replay reins replay <run_id>
pip install reins # Core: auto-tracing + DuckDB storage pip install reins[budget] # + Cost governance pip install reins[lens] # + Debugging (replay, context health) pip install reins[pulse] # + Reliability (guardrails, evaluation) pip install reins[all] # Everything
- Auto-instrumentation: Monkey-patches Anthropic & OpenAI SDKs — zero code changes to capture all LLM calls
- DuckDB storage: Embedded, zero-config. No Redis, no Postgres, no Docker
- SQL queries:
reins query "SELECT * FROM spans WHERE cost > 0.1" - Streaming support: Transparent interception of streaming responses
- Per-run budgets:
@trace(budget="0ドル.50")— hard cap per agent execution - 4 exceed strategies:
degrade(auto-switch model) /pause/alert/reject - Model degradation chains:
opus → sonnet → haiku,o3 → gpt-4o → gpt-4o-mini - Circuit breaker: Auto-halts runaway agent loops (>30 calls/minute)
- Budget persistence: Survives process restarts, daily/monthly auto-reset
- Cost anomaly detection: Alerts when a run costs 3x the historical average
- YAML team budgets: Organization → team → agent hierarchy
# reins.yaml budgets: daily: 10ドル.00 agents: research_agent: { per_run: 2ドル.00, on_exceed: degrade } code_agent: { per_run: 0ドル.50, on_exceed: reject }
reins trace list: Recent runs table with cost, status, degradation countreins trace show: Colored terminal call tree (LLM + tool calls)reins trace export --format otel: OTLP JSON export for Grafana/Datadog- Cross-agent correlation: Automatic trace_id propagation
reins replay: Interactive step-by-step agent replay (Enter/auto-play/quit)reins health: Context health curve with ASCII chart + per-span breakdown- Context Rot detection: 3-metric composite score (utilization, efficiency, duplication)
- Root cause analysis: Automatic causal chain for failures
- Runtime evaluators (coherence, instruction following)
- Guardrail engine (PII detection, SQL injection prevention)
- Automatic regression test generation from failed runs
- Transparent reverse proxy for Claude Code, aider, Cursor, or any tool that respects
ANTHROPIC_BASE_URL - Full budget enforcement at the HTTP layer
- Zero changes to the upstream tool
Reins integrates with 10+ agent frameworks through 4 adapters:
| Adapter | Frameworks | Integration |
|---|---|---|
ReinsCallbackHandler |
LangChain, LangGraph, LangFlow | ChatAnthropic(callbacks=[handler]) |
ReinsTracingProcessor |
OpenAI Agents SDK | add_trace_processor(processor) |
instrument_crew() |
CrewAI | instrument_crew(crew) |
ReinsSpanExporter |
Semantic Kernel, Pydantic AI, Haystack, Mastra | OTel SpanProcessor |
from reins.adapters.langchain import ReinsCallbackHandler handler = ReinsCallbackHandler(agent_name="my_langchain_agent") llm = ChatAnthropic(model="claude-sonnet-4-20250514", callbacks=[handler]) chain = prompt | llm | parser result = chain.invoke({"input": "..."})
from reins.adapters.openai_agents import ReinsTracingProcessor from agents import Agent, Runner, add_trace_processor add_trace_processor(ReinsTracingProcessor()) agent = Agent(name="assistant", model="gpt-4o") result = Runner.run_sync(agent, "Hello!")
from reins.adapters.crewai import instrument_crew from crewai import Crew, Agent, Task crew = Crew(agents=[...], tasks=[...]) instrument_crew(crew) # One line — instruments all agents and tasks result = crew.kickoff()
from reins.adapters.otel import ReinsSpanProcessor from opentelemetry.sdk.trace import TracerProvider provider = TracerProvider() provider.add_span_processor(ReinsSpanProcessor()) # Now any OTel-instrumented framework is automatically traced by Reins
| Reins | LiteLLM | Langfuse | Helicone | Portkey | |
|---|---|---|---|---|---|
| Architecture | SDK + Proxy | Proxy only | SDK | Proxy | Proxy/SDK |
| Infrastructure | Zero (embedded DuckDB) | Redis + Postgres | PostgreSQL | Cloud | Cloud |
| Budget enforcement | Smart degradation | Hard reject (400) | None | Rate limit | Key limit |
| Auto model switch | Yes | No | No | No | No |
| Circuit breaker | Yes | No | No | No | No |
| Per-agent budgets | Yes | Per-key | No | No | Partial |
| Agent replay | Yes | No | No | No | No |
| Context health | Yes | No | No | No | No |
| Framework adapters | 10+ frameworks | N/A | 12+ | N/A | 5+ |
| Setup | pip install reins |
docker-compose up |
docker-compose up |
Cloud signup | Cloud signup |
| Open source | BSL 1.1 (→ Apache 2030) | Enterprise paywall | MIT | MIT | Proprietary |
One-line difference: LiteLLM is an API Gateway (needs infra, hard-rejects on exceed). Reins is an Agent Runtime (zero-infra, degrades gracefully).
Modules communicate via an event bus — install only what you need, they auto-cooperate when co-installed. For example: Lens detects context rot → notifies Budget → Budget reduces remaining allocation.
- Product Requirements (PRD) — What we build and why
- Technical Design — Architecture, data models, API design
| Metric | Value | Source |
|---|---|---|
| Enterprise LLM API spend (2025 H1) | 8ドル.4B | Menlo Ventures |
| YoY GenAI enterprise investment growth | 3.2x (11ドル.5B → 37ドルB) | Menlo Ventures |
| Projected LLM API spend (2026) | 15ドルB+ | Menlo Ventures |
| Agentic AI projects to be canceled by 2027 | >40% | Gartner |
| Agent reliability as #1 enterprise barrier | 72% of practitioners | Pan et al. (2025) |
| Enterprises increasing AI budget in 2026 | 86% | Deloitte State of AI 2026 |
| Langfuse: SDK installs/month | 26M+ | ClickHouse acquisition |
| LangChain valuation (Oct 2025) | 1ドル.25B | Series B |
| Healthcare vertical AI spend (2025) | 1ドル.5B (3.3x YoY) | Menlo Ventures |
# Clone git clone https://github.com/catyans/reins.git cd reins # Install with dev deps pip install -e ".[dev,all]" # Run tests pytest tests/ -v # Generate charts python scripts/generate_charts.py
99 tests across unit and integration suites.
Business Source License 1.1 (BSL 1.1)
- Free for: personal use, internal enterprise use, academic research, contributing back
- Not allowed: building a competing commercial AI agent cost governance product/service
- Auto-converts to Apache 2.0 on April 4, 2030
For commercial licensing inquiries: 237344440@qq.com
Yanshu Wang (@catyans) — https://catyans.github.io
Reins: Take control of your AI agents.
pip install reins