Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: ArkNill/llm-relay

v0.9.6

21 May 04:21
@ArkNill ArkNill
bc437fd
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Single-command Windows install + atomic llm-relay init. Surfaced by
2026年05月21日 hmj PC dogfood: the previous init would write
ANTHROPIC_BASE_URL into ~/.claude/settings.json before the proxy
server was actually serving traffic, so any running Claude Code session
would route to a dead port and the symptom looked like a generic API
connection error. This release makes that ordering atomic and ships a
PowerShell one-line installer.

Added

  • One-command Windows installer (scripts/install.ps1): for users
    with Python 3.9+ already installed, the entire install flow collapses
    to one PowerShell command:
    irm https://raw.githubusercontent.com/ArkNill/llm-relay/main/scripts/install.ps1 | iex
    The script verifies Python, detects an active venv (uses it if
    present, else falls back to --user with a PATH hint), pip installs
    llm-relay[all], then calls llm-relay init which handles
    daemon + health-gate + routing in the right order. Documented in
    README.md "One-command install (Windows native)".
  • README "Prerequisites" section spelling out the Python 3.9+
    requirement, the recommended venv setup, and platform-specific install
    hints (winget / Homebrew / apt).

Changed

  • llm-relay init is now atomic (setup_init.run_init). New ordering:
    1. Detect CLIs, find port, init DB, write config, init knowledge dir.
    2. Start the proxy server.
    3. Health-gate (/_health polling).
    4. Configure Claude Code (ANTHROPIC_BASE_URL, MCP) only after
      the server is verified healthy.
      Routing is never activated unless the proxy actually responds, so an
      abort at step 2 or step 3 leaves ~/.claude/settings.json untouched
      -- no more "ConnectionRefused on port 8083" surprises in your next
      Claude Code session.
  • --skip-server now also skips routing. Previously it was a UX
    trap: server skipped, but settings.json still gained an
    ANTHROPIC_BASE_URL pointing at the not-running port. Now
    --skip-server skips the server, the health gate, AND the
    Claude Code routing change as a single decision. The summary message
    surfaces this explicitly so a follow-up llm-relay serve + re-run is
    the documented path to enabling routing.
  • Windows background daemon goes through win_service.start_daemon
    (_start_server in setup_init.py). The previous Popen +
    CREATE_NEW_PROCESS_GROUP left the proxy tied to its job object on
    Windows so it died when the parent SSH session disconnected (observed
    during the 2026年05月21日 dogfood). The win_service path uses pythonw plus
    CREATE_BREAKAWAY_FROM_JOB, which detaches cleanly.

Tests

  • New test_skip_server_does_not_write_settings_json regression in
    tests/test_api/test_init.py pins the atomic contract.

Fixed

  • Docker image build (Dockerfile, 0.9.5 carry-over): removed the
    unconditional COPY vendor/tokpress /tmp/tokpress + pip install
    step. The vendor source is kept outside the repository, so the COPY
    always failed in GitHub Actions and the Docker workflow had been
    broken on every tag since v0.9.2. The proxy already imports tokpress
    inside a try/except ImportError guard, so the image runs unchanged
    when the package is absent (_tokpress_available simply stays
    False). The v0.9.5 image was rebuilt via
    gh workflow run docker.yml -f tag=0.9.5 after the change landed on
    main; the v0.9.6 image rebuilds automatically on the new tag.
Assets 2
Loading

v0.9.5

21 May 00:20
@ArkNill ArkNill
ab4eef3
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Hotfix release surfaced by dogfooding verify install and --help on a
Windows host immediately after 0.9.4 went live on PyPI.

Fixed

  • __version__ was stale at 0.9.2 (src/llm_relay/__init__.py,
    src/llm_relay/detect/__init__.py). The 0.9.4 release bumped only
    pyproject.toml, leaving the importable runtime version out of sync;
    verify install's version_consistency check correctly flagged this
    as a warn. Both sources now read 0.9.5.
  • UnicodeEncodeError on --help under Windows + Korean locale (cp949)
    (src/llm_relay/detect/cli.py). Three occurrences of the em-dash
    character () inside Click docstrings / click.echo calls were
    unprintable on cp949 stdout and caused llm-relay --help to crash
    before showing any subcommands. Replaced with ASCII --. No behavior
    change beyond making the command actually usable on Windows-Korean
    hosts.
Loading

v0.9.4

20 May 22:25
@ArkNill ArkNill
985a2c0
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

First PyPI release since 0.9.1. Collapses the locally-tagged 0.9.2 and 0.9.3
work — neither of which was published to PyPI — into a single shipped
version. Adds the agent-driven onboarding track (env-fingerprint + verify

  • AGENT_SETUP playbook) on top of the prior unreleased changes.

Added

  • Agent-driven setup playbook (docs/AGENT_SETUP.md): structured onboarding document targeted at AI coding agents (Claude Code / Codex / Gemini) automating an llm-relay install on a user's behalf. Sequences env-fingerprint + verify into a five-phase flow (probe → install → init → per-CLI integration → final acceptance) with explicit [PERMISSION: ...] markers wherever the agent must pause for user consent, and explicit "stop and ask" rules for ambiguous states. README section "Agent-driven setup" links to it.
  • llm-relay verify command group (verify/): four idempotent verification subcommands designed for agent-driven onboarding. verify install confirms the package itself (Python version, importability, entry points, optional extras, version consistency). verify config confirms local state (db dir, schema, writability, config file, knowledge dir, port availability, no deprecated env vars). verify integration --cli {claude-code,openai-codex,gemini-cli,all} confirms each CLI is wired through the relay (binary on PATH, settings file present, ANTHROPIC_BASE_URL routing to localhost, MCP server registered). verify all aggregates install + config + integration. Shared output schema (schema_version: "1") emits per-check status (pass/fail/warn/skipped) with optional remediation strings; overall priority is fail > warn > pass. Skipped checks (e.g. CLI not installed) never escalate to fail. Optional --live probes /_health on the proxy. Common options: --format {text,json}, --quiet, --no-remediation. Exit code is 0 on pass/warn, 1 on fail.
  • llm-relay env-fingerprint command (env_fingerprint.py): single-shot, idempotent snapshot of the local LLM CLI environment for agent-driven onboarding. Emits a structured JSON (or YAML) document with sections for llm_relay package state, per-CLI install/auth/version, port availability, filesystem layout, redacted environment variables (API keys reported as set/empty/null only), and an optional doctor summary. Designed so an agent (Claude Code / Codex / Gemini) automating an llm-relay install can parse the environment instead of scraping init output. Schema versioned (schema_version: "1"); sub-probe failures surface as _error markers without crashing the snapshot. Options: --format {json,yaml}, --no-doctor, --ports.

Performance

  • Incremental composition cache (composition.py, #16): analyze_session_composition and analyze_session_composition_per_turn now fold delta turns into cached state instead of re-walking the full session on every new turn. Previously the cache invalidated whenever any new turn arrived, forcing an O(n2) replay; on long-running sessions this caused /api/v1/turns to balloon to tens of seconds and daemon RSS to climb above 10 GB within ~30 min of normal multi-agent traffic. Cache now keys on session_id alone with (max_turn_processed, accumulated, totals, ...) state; new turns trigger a WHERE turn_number > max_turn_processed fetch. Compaction events (storage_mode="full") reset accumulated state and re-absorb the snapshot. Exception during fold falls back to a full rebuild so an incremental error can't poison the cache.

Changed

  • Claude Code zone defaults restored to 1M model window: Yellow 500K / Orange 700K / Red 900K / Hard 1M (_zones.py, tui.py, display.py, .env.public). Reverts the 665K rescale that was applied on the basis of a single 5/13 observation of auto-compact firing around 650K. Subsequent multi-hundred-K sessions did not reproduce that trigger point — Opus 4.7 [1m] reaches ~95% (~950K) before client-side auto-compact engages, so the 665K ceiling was unnecessarily conservative. Stock deployments without 1M context entitlement can override via LLM_TOKEN_CEILING=200000.
  • Red-zone message split for CC vs Codex (i18n.py): new keys zone.abs.red.cc and zone.ratio.red.cc carry an explicit "Auto-compact imminent — hand off to a new session" guideline used by CC paths. Codex paths keep the existing "Session rotation required" wording (Codex has no client-side compaction; the 400K hard limit semantics differ).
  • duplicate_reads no longer accumulates across compaction (composition.py): when a storage_mode="full" turn arrives, read counts reset together with the accumulated context. Previously reads from pre-compaction turns were counted as duplicates of post-compaction reads, inflating duplicate_read_count and triggering spurious duplicate_read_warning flips. Visible in /api/v1/turns, /api/v1/display, dashboard Context Health, and TUI duplicate-file listings.

Tests

  • tests/test_api/test_turns.py::_zone_env autouse fixture now patches both _zones and routes modules so legacy 1M-scale assertions stay stable against new production defaults.
Loading

v0.9.1 — 3-Provider Status, Input Validation, 2-Column Dashboard

29 Apr 05:58
@ArkNill ArkNill

Choose a tag to compare

New Features

3-Provider Upstream Status Tiles

Dashboard now shows live API status for Anthropic, OpenAI, and Gemini simultaneously.

  • Anthropic/OpenAI: parsed from Statuspage.io API (shared _fetch_statuspage_sync function)
  • Gemini: parsed from Google Cloud Status API with Vertex AI/Gemini incident filtering
  • 60-second server-side cache per provider, fail-open (shows "unavailable" on error)
  • Color-coded banners (green/yellow/orange/red/blue/grey) with pulse animation for critical incidents
  • Click any banner to open the provider's status page

2-Column Responsive Dashboard

  • Left column: Turn Monitor + Context Health (real-time session monitoring)
  • Right column: CLI Status + API Health + Delegation Stats + Recent Delegations
  • Vertical divider between columns
  • Auto-stacks to single column below 1024px (mobile/tablet)
  • Status banners arranged horizontally across full width
  • Max width expanded from 960px to 1600px (95vw)

Dashboard Error Banner

  • Red "connection lost" banner appears when API requests fail
  • Auto-hides when connection recovers

Improvements

API Input Validation

All query parameters (window, limit, turn_start, turn_end) now return 400 with descriptive error message on invalid input, instead of crashing with 500.

  • _parse_int / _parse_float helpers with _ParamError exception
  • Applied to 11 endpoints

Performance

  • Token threshold env vars (LLM_TOKEN_A_YELLOW/ORANGE/RED/HARD/CEILING) cached at module load
  • No repeated os.getenv() in hot path

Packaging

  • Development Status: Alpha → Beta
  • Added classifiers: Environment :: Web Environment, Operating System :: OS Independent
  • Removed incomplete --fix TODO from doctor.py

Community Contributions (cnighswonger)

  • PR #9 : Codex GH App token injection via cli_delegate — opt-in bot identity for Codex gh operations. Default script path: ~/.llm-relay/github-apps/generate-token.sh
  • PR #10 : Surface term_name on /turns + dashboard render with sidLabel() helper + generic name preservation rule (prevents auto-registration from clobbering user-set labels)
  • PR #11 : Anthropic status tile on dashboard (server-side proxy + 60s cache) — extended to 3-provider in this release

Tests

  • New: _compat.py tests — cross-platform process inspection (7 tests)
  • New: doctor.py tests — health check validation (13 tests)
  • 503 tests pass (public)

No Breaking Changes

Drop-in upgrade from v0.9.0.

Loading

AltStyle によって変換されたページ (->オリジナル) /