Releases: ArkNill/llm-relay

v0.9.6

21 May 04:21

@ArkNill ArkNill

v0.9.6

bc437fd

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v0.9.6 Latest

Latest

Single-command Windows install + atomic llm-relay init. Surfaced by
2026年05月21日 hmj PC dogfood: the previous init would write
ANTHROPIC_BASE_URL into ~/.claude/settings.json before the proxy
server was actually serving traffic, so any running Claude Code session
would route to a dead port and the symptom looked like a generic API
connection error. This release makes that ordering atomic and ships a
PowerShell one-line installer.

Added

One-command Windows installer (scripts/install.ps1): for users
with Python 3.9+ already installed, the entire install flow collapses
to one PowerShell command:
```
irm https://raw.githubusercontent.com/ArkNill/llm-relay/main/scripts/install.ps1 | iex
```
The script verifies Python, detects an active venv (uses it if
present, else falls back to --user with a PATH hint), pip installs
llm-relay[all], then calls llm-relay init which handles
daemon + health-gate + routing in the right order. Documented in
README.md "One-command install (Windows native)".
README "Prerequisites" section spelling out the Python 3.9+
requirement, the recommended venv setup, and platform-specific install
hints (winget / Homebrew / apt).

Changed

llm-relay init is now atomic (setup_init.run_init). New ordering:
1. Detect CLIs, find port, init DB, write config, init knowledge dir.
2. Start the proxy server.
3. Health-gate (/_health polling).
4. Configure Claude Code (ANTHROPIC_BASE_URL, MCP) only after
  the server is verified healthy.
  Routing is never activated unless the proxy actually responds, so an
  abort at step 2 or step 3 leaves ~/.claude/settings.json untouched
  -- no more "ConnectionRefused on port 8083" surprises in your next
  Claude Code session.
--skip-server now also skips routing. Previously it was a UX
trap: server skipped, but settings.json still gained an
ANTHROPIC_BASE_URL pointing at the not-running port. Now
--skip-server skips the server, the health gate, AND the
Claude Code routing change as a single decision. The summary message
surfaces this explicitly so a follow-up llm-relay serve + re-run is
the documented path to enabling routing.
Windows background daemon goes through win_service.start_daemon
(_start_server in setup_init.py). The previous Popen +
CREATE_NEW_PROCESS_GROUP left the proxy tied to its job object on
Windows so it died when the parent SSH session disconnected (observed
during the 2026年05月21日 dogfood). The win_service path uses pythonw plus
CREATE_BREAKAWAY_FROM_JOB, which detaches cleanly.

Tests

New test_skip_server_does_not_write_settings_json regression in
tests/test_api/test_init.py pins the atomic contract.

Fixed

Docker image build (Dockerfile, 0.9.5 carry-over): removed the
unconditional COPY vendor/tokpress /tmp/tokpress + pip install
step. The vendor source is kept outside the repository, so the COPY
always failed in GitHub Actions and the Docker workflow had been
broken on every tag since v0.9.2. The proxy already imports tokpress
inside a try/except ImportError guard, so the image runs unchanged
when the package is absent (_tokpress_available simply stays
False). The v0.9.5 image was rebuilt via
gh workflow run docker.yml -f tag=0.9.5 after the change landed on
main; the v0.9.6 image rebuilds automatically on the new tag.

Assets 2

v0.9.5

21 May 00:20

@ArkNill ArkNill

v0.9.5

ab4eef3

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v0.9.5

Hotfix release surfaced by dogfooding verify install and --help on a
Windows host immediately after 0.9.4 went live on PyPI.

Fixed

__version__ was stale at 0.9.2 (src/llm_relay/__init__.py,
src/llm_relay/detect/__init__.py). The 0.9.4 release bumped only
pyproject.toml, leaving the importable runtime version out of sync;
verify install's version_consistency check correctly flagged this
as a warn. Both sources now read 0.9.5.
UnicodeEncodeError on --help under Windows + Korean locale (cp949)
(src/llm_relay/detect/cli.py). Three occurrences of the em-dash
character (—) inside Click docstrings / click.echo calls were
unprintable on cp949 stdout and caused llm-relay --help to crash
before showing any subcommands. Replaced with ASCII --. No behavior
change beyond making the command actually usable on Windows-Korean
hosts.

Assets 2

v0.9.4

20 May 22:25

@ArkNill ArkNill

v0.9.4

985a2c0

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v0.9.4

First PyPI release since 0.9.1. Collapses the locally-tagged 0.9.2 and 0.9.3
work — neither of which was published to PyPI — into a single shipped
version. Adds the agent-driven onboarding track (env-fingerprint + verify

AGENT_SETUP playbook) on top of the prior unreleased changes.

Added

Agent-driven setup playbook (docs/AGENT_SETUP.md): structured onboarding document targeted at AI coding agents (Claude Code / Codex / Gemini) automating an llm-relay install on a user's behalf. Sequences env-fingerprint + verify into a five-phase flow (probe → install → init → per-CLI integration → final acceptance) with explicit [PERMISSION: ...] markers wherever the agent must pause for user consent, and explicit "stop and ask" rules for ambiguous states. README section "Agent-driven setup" links to it.
llm-relay verify command group (verify/): four idempotent verification subcommands designed for agent-driven onboarding. verify install confirms the package itself (Python version, importability, entry points, optional extras, version consistency). verify config confirms local state (db dir, schema, writability, config file, knowledge dir, port availability, no deprecated env vars). verify integration --cli {claude-code,openai-codex,gemini-cli,all} confirms each CLI is wired through the relay (binary on PATH, settings file present, ANTHROPIC_BASE_URL routing to localhost, MCP server registered). verify all aggregates install + config + integration. Shared output schema (schema_version: "1") emits per-check status (pass/fail/warn/skipped) with optional remediation strings; overall priority is fail > warn > pass. Skipped checks (e.g. CLI not installed) never escalate to fail. Optional --live probes /_health on the proxy. Common options: --format {text,json}, --quiet, --no-remediation. Exit code is 0 on pass/warn, 1 on fail.
llm-relay env-fingerprint command (env_fingerprint.py): single-shot, idempotent snapshot of the local LLM CLI environment for agent-driven onboarding. Emits a structured JSON (or YAML) document with sections for llm_relay package state, per-CLI install/auth/version, port availability, filesystem layout, redacted environment variables (API keys reported as set/empty/null only), and an optional doctor summary. Designed so an agent (Claude Code / Codex / Gemini) automating an llm-relay install can parse the environment instead of scraping init output. Schema versioned (schema_version: "1"); sub-probe failures surface as _error markers without crashing the snapshot. Options: --format {json,yaml}, --no-doctor, --ports.

Performance

Incremental composition cache (composition.py, #16): analyze_session_composition and analyze_session_composition_per_turn now fold delta turns into cached state instead of re-walking the full session on every new turn. Previously the cache invalidated whenever any new turn arrived, forcing an O(n2) replay; on long-running sessions this caused /api/v1/turns to balloon to tens of seconds and daemon RSS to climb above 10 GB within ~30 min of normal multi-agent traffic. Cache now keys on session_id alone with (max_turn_processed, accumulated, totals, ...) state; new turns trigger a WHERE turn_number > max_turn_processed fetch. Compaction events (storage_mode="full") reset accumulated state and re-absorb the snapshot. Exception during fold falls back to a full rebuild so an incremental error can't poison the cache.

Changed

Claude Code zone defaults restored to 1M model window: Yellow 500K / Orange 700K / Red 900K / Hard 1M (_zones.py, tui.py, display.py, .env.public). Reverts the 665K rescale that was applied on the basis of a single 5/13 observation of auto-compact firing around 650K. Subsequent multi-hundred-K sessions did not reproduce that trigger point — Opus 4.7 [1m] reaches ~95% (~950K) before client-side auto-compact engages, so the 665K ceiling was unnecessarily conservative. Stock deployments without 1M context entitlement can override via LLM_TOKEN_CEILING=200000.
Red-zone message split for CC vs Codex (i18n.py): new keys zone.abs.red.cc and zone.ratio.red.cc carry an explicit "Auto-compact imminent — hand off to a new session" guideline used by CC paths. Codex paths keep the existing "Session rotation required" wording (Codex has no client-side compaction; the 400K hard limit semantics differ).
duplicate_reads no longer accumulates across compaction (composition.py): when a storage_mode="full" turn arrives, read counts reset together with the accumulated context. Previously reads from pre-compaction turns were counted as duplicates of post-compaction reads, inflating duplicate_read_count and triggering spurious duplicate_read_warning flips. Visible in /api/v1/turns, /api/v1/display, dashboard Context Health, and TUI duplicate-file listings.

Tests

tests/test_api/test_turns.py::_zone_env autouse fixture now patches both _zones and routes modules so legacy 1M-scale assertions stay stable against new production defaults.

Assets 2

v0.9.1 — 3-Provider Status, Input Validation, 2-Column Dashboard

29 Apr 05:58

@ArkNill ArkNill

v0.9.1

2787264

v0.9.1 — 3-Provider Status, Input Validation, 2-Column Dashboard

New Features

3-Provider Upstream Status Tiles

Dashboard now shows live API status for Anthropic, OpenAI, and Gemini simultaneously.

Anthropic/OpenAI: parsed from Statuspage.io API (shared _fetch_statuspage_sync function)
Gemini: parsed from Google Cloud Status API with Vertex AI/Gemini incident filtering
60-second server-side cache per provider, fail-open (shows "unavailable" on error)
Color-coded banners (green/yellow/orange/red/blue/grey) with pulse animation for critical incidents
Click any banner to open the provider's status page

2-Column Responsive Dashboard

Left column: Turn Monitor + Context Health (real-time session monitoring)
Right column: CLI Status + API Health + Delegation Stats + Recent Delegations
Vertical divider between columns
Auto-stacks to single column below 1024px (mobile/tablet)
Status banners arranged horizontally across full width
Max width expanded from 960px to 1600px (95vw)

Dashboard Error Banner

Red "connection lost" banner appears when API requests fail
Auto-hides when connection recovers

Improvements

API Input Validation

All query parameters (window, limit, turn_start, turn_end) now return 400 with descriptive error message on invalid input, instead of crashing with 500.

_parse_int / _parse_float helpers with _ParamError exception
Applied to 11 endpoints

Performance

Token threshold env vars (LLM_TOKEN_A_YELLOW/ORANGE/RED/HARD/CEILING) cached at module load
No repeated os.getenv() in hot path

Packaging

Development Status: Alpha → Beta
Added classifiers: Environment :: Web Environment, Operating System :: OS Independent
Removed incomplete --fix TODO from doctor.py

Community Contributions (cnighswonger)

PR #9 : Codex GH App token injection via cli_delegate — opt-in bot identity for Codex gh operations. Default script path: ~/.llm-relay/github-apps/generate-token.sh
PR #10 : Surface term_name on /turns + dashboard render with sidLabel() helper + generic name preservation rule (prevents auto-registration from clobbering user-set labels)
PR #11 : Anthropic status tile on dashboard (server-side proxy + 60s cache) — extended to 3-provider in this release

Tests

New: _compat.py tests — cross-platform process inspection (7 tests)
New: doctor.py tests — health check validation (13 tests)
503 tests pass (public)

No Breaking Changes

Drop-in upgrade from v0.9.0.

Assets 2

Releases: ArkNill/llm-relay

v0.9.6

Added

Changed

Tests

Fixed

Uh oh!

v0.9.5

Fixed

Uh oh!

v0.9.4

Added

Performance

Changed

Tests

Uh oh!

v0.9.1 — 3-Provider Status, Input Validation, 2-Column Dashboard

New Features

3-Provider Upstream Status Tiles

2-Column Responsive Dashboard

Dashboard Error Banner

Improvements

API Input Validation

Performance

Packaging

Community Contributions (cnighswonger)

Tests

No Breaking Changes

Uh oh!