Unified LLM usage management — API proxy, session diagnostics, multi-CLI orchestration.
- Proxy: Transparent API proxy with cache/token monitoring and 12-strategy pruning
- Detect: 7 detectors (orphan, stuck, bloat, synthetic, cache, resume, microcompact)
- Recover: Session recovery and doctor (7 health checks)
- Guard: 4-tier threshold daemon with dual-zone classification
- Cost: Per-1% cost calculation and rate-limit header analysis
- Orch: Multi-CLI orchestration (Claude Code, Codex CLI, Gemini CLI)
- Display: Multi-CLI session monitor with provider badges and liveness detection
- I18n: Multi-language support (English, Korean) with browser auto-detection and
LLM_RELAY_LANGenv - MCP: 8 tools via stdio transport (cli_delegate, cli_status, cli_probe, orch_delegate, orch_history, relay_stats, session_turns, session_history)
For Windows users who just want it running, after Python 3.9+ is installed:
irm https://raw.githubusercontent.com/ArkNill/llm-relay/main/scripts/install.ps1 | iex
That script pip installs llm-relay[all], starts the proxy as a Windows
background daemon, health-gates it, and only then routes Claude Code through
it. Routing is never activated unless the proxy actually responds, so this is
safe to run on a machine where Claude Code is already configured -- the
worst case is the install aborts with a clear message and leaves your
existing setup untouched. See Prerequisites below for the
Python requirement and venv guidance.
If you would rather do it by hand (Linux, macOS, or just to see each step), keep reading.
- Python 3.9 or newer (3.12 recommended). We do not bundle a Python
runtime; install it once and llm-relay reuses it.
- Windows:
winget install Python.Python.3.12or python.org/downloads - macOS:
brew install python@3.12 - Linux: your distribution's package manager (
apt install python3.12,dnf install python3.12, etc.)
- Windows:
- (Recommended) A virtual environment. Clean uninstall, no PATH surprises, isolated dependency tree.
Windows (pip)
python -m venv .venv .venv\Scripts\activate
Windows (conda)
conda create -n llm-relay python=3.12 conda activate llm-relay
Linux / macOS (pip)
python3 -m venv .venv
source .venv/bin/activate# Default (SQLite, zero-config) pip install llm-relay # With proxy + web dashboard pip install llm-relay[proxy] # With PostgreSQL support (long-term analytics + vector search) pip install llm-relay[pg] # With MCP server (Python 3.10+) pip install llm-relay[mcp] # Everything pip install llm-relay[all]
| SQLite (default) | PostgreSQL | |
|---|---|---|
| Setup | Zero-config | Requires PG server |
| Best for | Getting started, light usage | Long-term data analytics, vector search |
| Install | pip install llm-relay |
pip install llm-relay[pg] |
| Config | (none needed) | LLM_RELAY_DB=postgresql://user:pass@host/db |
llm-relay init
llm-relay init # Auto-detect CLIs, configure proxy, start serverllm-relay scan # Session health check (7 detectors) llm-relay doctor # Configuration health check (7 checks) llm-relay recover # Extract session context for resumption llm-relay serve # Start proxy server + web dashboard llm-relay top # Live terminal monitor (btop-style) llm-relay service install # Windows: background service + auto-start (no console window) llm-relay service stop # Windows: stop background service llm-relay service uninstall # Windows: remove service + cleanup
# Native (Linux/macOS/Windows)
llm-relay serve --port 8080Then open:
/dashboard/— CLI status, cost, delegation history, Turn Monitor (alive sessions only;?include_dead=1to bypass)/display/— Turn counter with CC/Codex/Gemini session cards (alive filter: CC via cc_pid+TTY fallback, Codex/Gemini via fd-open; Windows uses mtime+process detection)/history/— Session conversation history browser
llm-relay-mcp # stdio transport, 8 tools# Set in Claude Code llm-relay connect # Auto-configures Claude Code proxy
If you would rather have your existing coding agent (Claude Code, Codex,
Gemini) run the install for you, point it at
docs/AGENT_SETUP.md. It is a structured playbook
the agent follows step by step, using llm-relay env-fingerprint and
llm-relay verify to probe and check each step without scraping output.
llm-relay env-fingerprint --format json # state snapshot llm-relay verify install --format json # is the package usable? llm-relay verify config --format json # is local state set up? llm-relay verify integration --cli claude-code # is the CLI wired? llm-relay verify all # everything at once
Exit code is 0 on pass/warn, 1 on fail.
| CLI | Status |
|---|---|
| Claude Code | Fully supported |
| OpenAI Codex | Fully supported |
| Gemini CLI | Display supported, oauth-personal has known 403 server-side bug (#25425) |
| Platform | Mode | Notes |
|---|---|---|
| Linux | Native | Full feature set, systemd recommended |
| macOS | Native | Full feature set |
| Windows | Native | llm-relay service install for background daemon (no console window) |
- Python >= 3.9
- MCP tools require Python >= 3.10
MIT
Part of the QuartzUnit open-source ecosystem.