-
Notifications
You must be signed in to change notification settings - Fork 2
Releases: ArkNill/llm-relay
v0.9.6
bc437fd Single-command Windows install + atomic
llm-relay init. Surfaced by
2026年05月21日 hmj PC dogfood: the previousinitwould write
ANTHROPIC_BASE_URLinto~/.claude/settings.jsonbefore the proxy
server was actually serving traffic, so any running Claude Code session
would route to a dead port and the symptom looked like a generic API
connection error. This release makes that ordering atomic and ships a
PowerShell one-line installer.
Added
- One-command Windows installer (
scripts/install.ps1): for users
with Python 3.9+ already installed, the entire install flow collapses
to one PowerShell command:The script verifies Python, detects an active venv (uses it ifirm https://raw.githubusercontent.com/ArkNill/llm-relay/main/scripts/install.ps1 | iex
present, else falls back to--userwith a PATH hint),pip installs
llm-relay[all], then callsllm-relay initwhich handles
daemon + health-gate + routing in the right order. Documented in
README.md"One-command install (Windows native)". - README "Prerequisites" section spelling out the Python 3.9+
requirement, the recommended venv setup, and platform-specific install
hints (winget / Homebrew / apt).
Changed
llm-relay initis now atomic (setup_init.run_init). New ordering:- Detect CLIs, find port, init DB, write config, init knowledge dir.
- Start the proxy server.
- Health-gate (
/_healthpolling). - Configure Claude Code (
ANTHROPIC_BASE_URL, MCP) only after
the server is verified healthy.
Routing is never activated unless the proxy actually responds, so an
abort at step 2 or step 3 leaves~/.claude/settings.jsonuntouched
-- no more "ConnectionRefused on port 8083" surprises in your next
Claude Code session.
--skip-servernow also skips routing. Previously it was a UX
trap: server skipped, butsettings.jsonstill gained an
ANTHROPIC_BASE_URLpointing at the not-running port. Now
--skip-serverskips the server, the health gate, AND the
Claude Code routing change as a single decision. The summary message
surfaces this explicitly so a follow-upllm-relay serve+ re-run is
the documented path to enabling routing.- Windows background daemon goes through
win_service.start_daemon
(_start_serverinsetup_init.py). The previous Popen +
CREATE_NEW_PROCESS_GROUPleft the proxy tied to its job object on
Windows so it died when the parent SSH session disconnected (observed
during the 2026年05月21日 dogfood). The win_service path uses pythonw plus
CREATE_BREAKAWAY_FROM_JOB, which detaches cleanly.
Tests
- New
test_skip_server_does_not_write_settings_jsonregression in
tests/test_api/test_init.pypins the atomic contract.
Fixed
- Docker image build (
Dockerfile, 0.9.5 carry-over): removed the
unconditionalCOPY vendor/tokpress /tmp/tokpress+pip install
step. The vendor source is kept outside the repository, so the COPY
always failed in GitHub Actions and the Docker workflow had been
broken on every tag since v0.9.2. The proxy already importstokpress
inside atry/except ImportErrorguard, so the image runs unchanged
when the package is absent (_tokpress_availablesimply stays
False). The v0.9.5 image was rebuilt via
gh workflow run docker.yml -f tag=0.9.5after the change landed on
main; the v0.9.6 image rebuilds automatically on the new tag.
Assets 2
v0.9.5
ab4eef3 Hotfix release surfaced by dogfooding
verify installand--helpon a
Windows host immediately after 0.9.4 went live on PyPI.
Fixed
__version__was stale at0.9.2(src/llm_relay/__init__.py,
src/llm_relay/detect/__init__.py). The 0.9.4 release bumped only
pyproject.toml, leaving the importable runtime version out of sync;
verify install'sversion_consistencycheck correctly flagged this
as a warn. Both sources now read0.9.5.UnicodeEncodeErroron--helpunder Windows + Korean locale (cp949)
(src/llm_relay/detect/cli.py). Three occurrences of the em-dash
character (—) inside Click docstrings /click.echocalls were
unprintable on cp949 stdout and causedllm-relay --helpto crash
before showing any subcommands. Replaced with ASCII--. No behavior
change beyond making the command actually usable on Windows-Korean
hosts.
Assets 2
v0.9.4
985a2c0 First PyPI release since 0.9.1. Collapses the locally-tagged 0.9.2 and 0.9.3
work — neither of which was published to PyPI — into a single shipped
version. Adds the agent-driven onboarding track (env-fingerprint + verify
- AGENT_SETUP playbook) on top of the prior unreleased changes.
Added
- Agent-driven setup playbook (
docs/AGENT_SETUP.md): structured onboarding document targeted at AI coding agents (Claude Code / Codex / Gemini) automating an llm-relay install on a user's behalf. Sequencesenv-fingerprint+verifyinto a five-phase flow (probe → install → init → per-CLI integration → final acceptance) with explicit[PERMISSION: ...]markers wherever the agent must pause for user consent, and explicit "stop and ask" rules for ambiguous states. README section "Agent-driven setup" links to it. llm-relay verifycommand group (verify/): four idempotent verification subcommands designed for agent-driven onboarding.verify installconfirms the package itself (Python version, importability, entry points, optional extras, version consistency).verify configconfirms local state (db dir, schema, writability, config file, knowledge dir, port availability, no deprecated env vars).verify integration --cli {claude-code,openai-codex,gemini-cli,all}confirms each CLI is wired through the relay (binary on PATH, settings file present,ANTHROPIC_BASE_URLrouting to localhost, MCP server registered).verify allaggregates install + config + integration. Shared output schema (schema_version: "1") emits per-check status (pass/fail/warn/skipped) with optional remediation strings; overall priority isfail > warn > pass. Skipped checks (e.g. CLI not installed) never escalate to fail. Optional--liveprobes/_healthon the proxy. Common options:--format {text,json},--quiet,--no-remediation. Exit code is 0 on pass/warn, 1 on fail.llm-relay env-fingerprintcommand (env_fingerprint.py): single-shot, idempotent snapshot of the local LLM CLI environment for agent-driven onboarding. Emits a structured JSON (or YAML) document with sections forllm_relaypackage state, per-CLI install/auth/version, port availability, filesystem layout, redacted environment variables (API keys reported asset/empty/nullonly), and an optionaldoctorsummary. Designed so an agent (Claude Code / Codex / Gemini) automating an llm-relay install can parse the environment instead of scrapinginitoutput. Schema versioned (schema_version: "1"); sub-probe failures surface as_errormarkers without crashing the snapshot. Options:--format {json,yaml},--no-doctor,--ports.
Performance
- Incremental composition cache (
composition.py, #16):analyze_session_compositionandanalyze_session_composition_per_turnnow fold delta turns into cached state instead of re-walking the full session on every new turn. Previously the cache invalidated whenever any new turn arrived, forcing an O(n2) replay; on long-running sessions this caused/api/v1/turnsto balloon to tens of seconds and daemon RSS to climb above 10 GB within ~30 min of normal multi-agent traffic. Cache now keys onsession_idalone with(max_turn_processed, accumulated, totals, ...)state; new turns trigger aWHERE turn_number > max_turn_processedfetch. Compaction events (storage_mode="full") reset accumulated state and re-absorb the snapshot. Exception during fold falls back to a full rebuild so an incremental error can't poison the cache.
Changed
- Claude Code zone defaults restored to 1M model window: Yellow 500K / Orange 700K / Red 900K / Hard 1M (
_zones.py,tui.py,display.py,.env.public). Reverts the 665K rescale that was applied on the basis of a single 5/13 observation of auto-compact firing around 650K. Subsequent multi-hundred-K sessions did not reproduce that trigger point — Opus 4.7 [1m] reaches ~95% (~950K) before client-side auto-compact engages, so the 665K ceiling was unnecessarily conservative. Stock deployments without 1M context entitlement can override viaLLM_TOKEN_CEILING=200000. - Red-zone message split for CC vs Codex (
i18n.py): new keyszone.abs.red.ccandzone.ratio.red.cccarry an explicit "Auto-compact imminent — hand off to a new session" guideline used by CC paths. Codex paths keep the existing "Session rotation required" wording (Codex has no client-side compaction; the 400K hard limit semantics differ). duplicate_readsno longer accumulates across compaction (composition.py): when astorage_mode="full"turn arrives, read counts reset together with the accumulated context. Previously reads from pre-compaction turns were counted as duplicates of post-compaction reads, inflatingduplicate_read_countand triggering spuriousduplicate_read_warningflips. Visible in/api/v1/turns,/api/v1/display, dashboard Context Health, and TUI duplicate-file listings.
Tests
tests/test_api/test_turns.py::_zone_envautouse fixture now patches both_zonesandroutesmodules so legacy 1M-scale assertions stay stable against new production defaults.
Assets 2
v0.9.1 — 3-Provider Status, Input Validation, 2-Column Dashboard
New Features
3-Provider Upstream Status Tiles
Dashboard now shows live API status for Anthropic, OpenAI, and Gemini simultaneously.
- Anthropic/OpenAI: parsed from Statuspage.io API (shared
_fetch_statuspage_syncfunction) - Gemini: parsed from Google Cloud Status API with Vertex AI/Gemini incident filtering
- 60-second server-side cache per provider, fail-open (shows "unavailable" on error)
- Color-coded banners (green/yellow/orange/red/blue/grey) with pulse animation for critical incidents
- Click any banner to open the provider's status page
2-Column Responsive Dashboard
- Left column: Turn Monitor + Context Health (real-time session monitoring)
- Right column: CLI Status + API Health + Delegation Stats + Recent Delegations
- Vertical divider between columns
- Auto-stacks to single column below 1024px (mobile/tablet)
- Status banners arranged horizontally across full width
- Max width expanded from 960px to 1600px (95vw)
Dashboard Error Banner
- Red "connection lost" banner appears when API requests fail
- Auto-hides when connection recovers
Improvements
API Input Validation
All query parameters (window, limit, turn_start, turn_end) now return 400 with descriptive error message on invalid input, instead of crashing with 500.
_parse_int/_parse_floathelpers with_ParamErrorexception- Applied to 11 endpoints
Performance
- Token threshold env vars (
LLM_TOKEN_A_YELLOW/ORANGE/RED/HARD/CEILING) cached at module load - No repeated
os.getenv()in hot path
Packaging
- Development Status: Alpha → Beta
- Added classifiers:
Environment :: Web Environment,Operating System :: OS Independent - Removed incomplete
--fixTODO from doctor.py
Community Contributions (cnighswonger)
- PR #9 : Codex GH App token injection via
cli_delegate— opt-in bot identity for Codexghoperations. Default script path:~/.llm-relay/github-apps/generate-token.sh - PR #10 : Surface
term_nameon/turns+ dashboard render withsidLabel()helper + generic name preservation rule (prevents auto-registration from clobbering user-set labels) - PR #11 : Anthropic status tile on dashboard (server-side proxy + 60s cache) — extended to 3-provider in this release
Tests
- New:
_compat.pytests — cross-platform process inspection (7 tests) - New:
doctor.pytests — health check validation (13 tests) - 503 tests pass (public)
No Breaking Changes
Drop-in upgrade from v0.9.0.