-
-
Notifications
You must be signed in to change notification settings - Fork 1
Releases: DFKHelper/token-goat
v1.8.0
What's New in 1.8.0
Added
- curl -v verbose compression —
post_bashdetectscurl -v/--verbosecommands and strips TLS handshake noise, redundant request/response headers, and progress meters; keeps the request line, status code,content-type, and body - jest/vitest verbose PASS-suite compression — collapses
PASS src/...blocks with all-green test lines into a single summary line - JUnit XML structured summary — parses
<testsuites>XML frompost_bashoutput and emits a compact pass/fail/skip count instead of raw XML - Task-output temp file redirect —
pre_readdetects Claude task-output files in%TEMP%/claude/...and transparently redirects totoken-goat bash-output - Minified JS/CSS grep elision —
post_bashtruncates grep/rg hits on minified files (.min.js,.min.css, bundled output) to avoid multi-MB lines flooding context - Compaction hint suppression — suppresses the redundant re-read hint that fired after every conversation compaction even when files hadn't changed
- go test -v compression — collapses passing
=== RUN/--- PASSblocks in verbose Go test output - make/cmake/ninja build compression — strips redundant compile command echoes while keeping warnings and errors
- Python traceback deduplication — collapses repeated identical tracebacks in test output to a single copy
- tsc output compression — suppresses TypeScript compiler progress lines, keeping only errors and the final summary
Assets 2
v1.7.1
Fixed
-
_index_spawn_activeguards against PID recycling. Within the 10-minuteINDEX_SPAWN_TTLthe OS can reuse a finished indexer's PID for an unrelated process, blocking fresh indexing spawns for up to 10 minutes. The check now reads the running process's cmdline and returnsFalsewhen it lackstoken_goat, falling back to trusting the PID when the cmdline is unreadable. -
kill_duplicate_daemonnow unlinks the stale PID file after a successful kill. Previously the file was removed on the "already dead" path but not on the success path, leaving--checkandis_worker_alive()reporting stale state until the next cleanup pass. -
get_context_pressureavoids a redundantsafe_loadwhen a cache is already in scope. Accepts an optionalcache=kwarg; callers that already hold a loadedSessionCachepass it in and skip the extra disk I/O. -
normalize_pathdocstring corrected. Steps 2 and 3 were listed in the wrong order relative to actual execution. -
Shell-neutral bash-compress disable hint. All 34
TOKEN_GOAT_BASH_COMPRESS=0hint strings were POSIX-only prefix assignment — broken in PowerShell and cmd.exe. Hints now readdisable via TOKEN_GOAT_BASH_COMPRESS. -
Bash pre-hook fast-path via
bash_detect. Adds a<1 msdict lookup before the ~75 msbash_compressimport; unrecognized commands skip the import entirely. -
enqueue_dirtyis now append-only with a byte-based cap. Eliminates O(queue-size) rewrite and POSIX rename race on every Edit/Write hook. -
Corrupt
.drainingfile is quarantined instead of raising. Prevents worker crash when another process races on the drain rename. -
post_bashuses a single session load/save round-trip. Was callingload()/save()up to four times per invocation. -
Output size cap applied before payload work in
post_bash. Prevents the full pipeline running on 4 MB stdout that will be truncated anyway. -
Cache eviction throttled to at most once per 60 seconds. Eliminates per-write O(n) directory scan.
-
pre_readBash branch usessafe_load(). Corrupt session files no longer crash the Bash pre-hook.
Assets 2
v1.7.0
See CHANGELOG for full details.
Fixed
- Skill dedup permanently disarmed after first compaction —
post_skillearly-return path skippedmark_skill_loaded(), freezingskill_tsand causingpre_skillto pass all subsequent loads through undeduped after any compaction event.
Added
token-goat compact-doc— build a deterministic extractive sidecar for any large reference.mdfile;pre_readserves it in place of the full file (80–95% smaller). Auto-staled on edit.post_compact_full_loadsconfig knob ([skill_preservation] post_compact_full_loads, defaultfalse) — keep skill dedup armed across compaction epochs; settrueto restore the pre-1.7 one-full-reload-per-epoch behaviour.- MCP screenshot deny-redirect —
pre_screenshotdenies chrome-devtools and playwright screenshot calls without afilePath/filenameargument, forcing the save-to-disk path so image-shrink applies (~39K tokens raw → ~8K compressed). - Baseline v2 —
token-goat baselinenow costs the skill listing, shows one row per configured MCP server, and adds a--usageflag that annotates each row with historical call counts. - Session window denial for in-context file reads —
pre_readactively denies re-reads of files confirmed in the current context window (config:[hints] deny_reread, default on).
Assets 2
v1.5.2
Three fixes: Codex hook wire-format compatibility, and two Windows coarse-mtime correctness issues in the cache and session layers.
Codex hook responses now pass schema validation
Codex 0.137.0 validates every hook response against embedded JSON schemas with additionalProperties: false, so any unrecognised key causes "hook returned invalid ... JSON output" for the entire response — including SessionStart, PreToolUse, and PostToolUse. The root cause was _tg_elapsed_ms (and sibling _tg_handler/_tg_error fields) added by the internal dispatch() function and then emitted verbatim. The denormalize_response Codex branch now strips all _tg_* keys before output. The same path also injects the required hookEventName const field into hookSpecificOutput — Codex requires it on every hookSpecificOutput shape and token-goat was not emitting it because Claude Code does not require it. A _codex_hook_event_name() helper resolves the correct value (e.g. "pre-read" → "PreToolUse") from the hook registry. The old camelCase→snake_case key conversion (_translate_hso_to_codex) is no longer applied — Codex 0.137.0+ uses camelCase throughout hookSpecificOutput.
Freshest cache entry survives its own store call's eviction
evict_cache_dir sorts eviction candidates oldest-first by float(st_mtime) with a stable sort. When the just-written (MRU) entry shares a coarse st_mtime with older siblings, the stable sort falls back to arbitrary iterdir order, which on NTFS can place the newest file first and evict it — so a store_output call could delete the very entry it had just written. evict_cache_dir now accepts a protect_ids set that is excluded from the victim list regardless of timestamp, and skill_cache.store_output passes the id it just wrote.
save() refreshes the process-local load cache
session.load() caches (object, mtime) per session and serves the cached object whenever cached_mtime == current_mtime. When a later save()'s post-write timestamp aliased the mtime a previous load() had cached, the proc-cache kept serving the stale pre-save object on the next in-process load() even though the on-disk JSON was correct. save() now overwrites an existing proc-cache entry with the object it just persisted on every successful write.
Assets 2
v1.5.1
Correctness fixes for cache size accounting (compressed .gz bodies), surgical reads (oversized-docstring cap, signature-boundary fix), path normalization (uppercase WSL drives), mixed-case skill-compact invalidation, and the Gemini hook bridge (preserve systemMessage, route additionalContext natively), plus two documentation corrections.
See the CHANGELOG for full details.
Assets 2
v1.5.0
Context-pressure awareness: one source of truth for how full the window is, and hints that get terser as it fills. Ships alongside three install fixes that restore hook forwarding under editable installs and silence a recurring doctor warning.
Centralized context-pressure model
get_context_pressure(session_id) in compact.py is now the single place that answers how close a session is to autocompaction. It returns a frozen ContextPressure — a fill_fraction paired with a tier of cool, warm, hot, or critical. The estimate sums the known context contributors (loaded skill bodies, the ~10,800-token skills catalog, and per-event costs for bash history, web history, and read files) and divides by the fixed 660,000-token autocompact budget rather than the model's raw window, so the fraction carries the same meaning no matter which model is driving the session. The old _estimate_context_fill helper and the inline calculation in the session hook both defer to it, retiring the copies of the 660 K constant that had spread across half a dozen call sites in favor of one shared CONTEXT_AUTOCOMPACT_TOKENS.
Named tier boundaries
The fraction-to-tier mapping lives in tier_for_fraction(), backed by three named constants: CONTEXT_TIER_WARM (0.50), CONTEXT_TIER_HOT (0.70), and CONTEXT_TIER_CRITICAL (0.85). The bands are cool below 0.50, warm up to 0.70, hot up to 0.85, and critical at or above it. With the magic numbers pulled out of the band checks, the boundaries are defined once and the tests pin them directly.
Pressure-aware surgical-read hints
The pre-read hook tightens its large-file threshold as the window fills. A file earns a surgical-read suggestion past 500 lines while the session is cool, 350 when warm, 200 when hot, and 50 when critical. It also folds a single per-tier note into the read's additional context: "Context warming" at warm, "Context pressure" at hot, "CONTEXT CRITICAL" at critical. The note is fingerprinted by tier, so it fires once per band rather than on every read. Cool sessions get no note.
Smaller manifests under pressure
compute_adaptive_budget now weighs context pressure when it sizes the compaction manifest. Once the window runs hot the budget is capped at 500 tokens, and at critical it drops to 300, so the manifest stops adding to the very problem it exists to summarize.
Install robustness
Hooks no longer silently disable themselves under an editable install. The tg-hook wrapper carries an if not exist "<sentinel>" gate that short-circuits to a bare {"continue":true} during the uv tool install --reinstall race, when the venv's token_goat module is briefly absent. The sentinel used to be a hardcoded site-packages/token_goat/__init__.py path, which never exists under an editable install (uv sync, the project .venv), so the gate stayed permanently true and every hook no-op'd — the whole tool went dark with no error. The wrapper now resolves the sentinel through importlib.util.find_spec("token_goat").origin, which points at src/token_goat/__init__.py for editable installs and site-packages/... for regular ones, and falls back to an ungated wrapper when no sentinel resolves. A live handler emits {"continue": true, "_tg_elapsed_ms": N}; the _tg_elapsed_ms field is the tell that forwarding actually ran.
Re-install purges orphaned tokenwise entries. After the tokenwise → token-goat rename, a re-install left the old hook and permission lines stranded in settings.json and the Codex config.toml, so both harnesses kept invoking a binary that no longer existed. patch_settings_json and patch_codex_config now strip any pre-rename tokenwise command and permission entry before writing the current ones.
Hook wrapper is written as bytes to stop CRLF doubling. hook_wrapper_content() hand-bakes platform-correct line endings — \r\n on Windows — then was written through atomic_write_text, whose text-mode handle translated every \n to \r\n a second time, doubling each line ending to \r\r\n on disk. cmd.exe tolerated the stray carriage return so forwarding still worked, but doctor does a byte-exact compare of the on-disk wrapper against the regenerated content and warned differs from expected — run token-goat install to refresh on every run, a nag that reinstalling could never clear because it rewrote the same doubled bytes. The wrapper now goes through atomic_write_bytes, preserving the authored endings verbatim.
Session-cache integrity
Concurrent session saves no longer drop an edit. The save() fast path skipped its compare-and-swap re-read and merge whenever the on-disk (st_mtime, st_size) fingerprint still matched the one captured at load. That fingerprint aliases: two caches whose keys are the same length serialize to byte-identical JSON sizes, and a float st_mtime rounds two sub-microsecond writes to the same value. When two writers collided on both fields the second skipped the merge and overwrote the first, losing exactly one edit — the 200-edit concurrency stress test intermittently saw 199. The fast path now consults an in-process version registry so a same-process writer that already advanced the version forces the stale save back through the merge, and the fingerprint is taken from integer st_mtime_ns instead of the rounded float, so a cross-process skip now requires a true nanosecond-and-size collision rather than a rounding coincidence.
Assets 2
v1.3.0
[1.3.0] - 2026年06月05日
Context growth audit — four changes that cut session context size and make overhead visible.
Context footprint in doctor
token-goat doctor --context now prints a Context footprint section measuring every token source that pads the context window each turn: the skills catalog (~10,800 tokens/turn for a typical install), loaded skill bodies accumulated in system-reminder injections, CLAUDE.md + MEMORY.md meta-files, and the rolling conversation estimate. The section shows fill % against the 660,000-token autocompact threshold, an ETA in turns at the current growth rate, and an Actions block naming the exact commands to run when any loaded skill above 2,000 tokens is missing a compact.
Auto-shown when estimated fill exceeds 40 % or any loaded skill > 2 K tokens lacks a compact; always shown with --context.
Compact pre-generation at install time
token-goat install now runs skill-compact --all as a final step, so compacts are ready before the first session — no post-install warm-up turn required. A sentinel file (skill_pregen_sentinel.json) records the catalog count; the doctor section uses it to detect skills added after the last pre-gen pass.
Per-skill compact advisory in post_skill
When a skill body lands in context, the post_skill hook now reports the compact's token savings inline (pre-generated compacts, sync-generated compacts for bodies < 40 KB, background-generated for larger bodies, info-only when no worker is running). Advisory fires only for bodies above 8 KB to stay silent for tiny skills.
Threshold-crossing context advisory in user_prompt_submit
A lightweight ETA advisory fires the first time estimated context fill crosses 50 % and again at 70 %. The message is appended to the existing status line (bracket-joined, not a separate injection) and references /compact now at 70 %. Resets after each compact. Configurable via hints.context_threshold_advisory = false.
Assets 4
v1.2.0
[1.2.0] - 2026年06月05日
14 commits since v1.1.0. Output overflow guard, cross-platform path normalization fixes, and a reliability pass.
Output Overflow Guard
Surgical-read commands (symbol, read, section, bash-output, web-output, and the rest) now cap oversized output before it reaches the model. When estimated tokens exceed the cap, the output is head-truncated on a line boundary. A marker line is appended naming the cap, the truncation ratio, and the narrowing action — symbol users get directed toward file::Class.method lookups, section users toward sub-headings, cached-output users toward --grep/--tail.
Default cap: 25,000 tokens. Configure via [overflow_guard] max_tokens in config.toml, override with TOKEN_GOAT_OVERFLOW_MAX_TOKENS=<n>, or disable with TOKEN_GOAT_OVERFLOW_GUARD=0 / [overflow_guard] enabled = false.
The estimator is deliberately conservative — 3 chars/token, same rate as the compaction manifest — so the cap is never under-applied. ANSI escapes are stripped before estimation since color codes inflate length without adding model-visible tokens. A single-line blob (no internal newlines) is sliced at the char budget so it cannot pass through whole.
Cross-Platform Path Normalization
Two fixes that make path-keyed caches work correctly across Windows, WSL, and Linux:
normalize_path / paths.normalize_key — Drive-letter lowercasing (C: → c:) is now unconditional. The previous guard sys.platform == "win32" meant a WSL process that emits a Windows-format path (C:/Users/...) produced a different cache key than a native Windows process reading the same file. Both now produce c:/users/....
hooks_skill.post_skill — Windows-style backslash paths like C:\Users\user\.claude\skills\ralph were not stripped on Linux because the inline guard used _os.sep (/ on Linux) instead of the string literal "\\". The inline block is now a call to _normalize_skill_name, which hardcodes "\\" and handles both separator styles on every platform.
Reliability
- Worker dirty-queue torn writes. Concurrent
_append_dirtycalls could produce truncated or concatenated JSON lines under write contention. An OS-level file lock (fcntlon POSIX,msvcrton Windows) now serializes appends, same as the session cache. - SQLite WAL checkpoint mode. Changed from
RESTARTtoPASSIVEon connection open.RESTARTwaited for all readers to drain, blocking hook subprocesses for hundreds of milliseconds during active indexing.PASSIVEcheckpoints cooperatively and does not wait.
Assets 2
v1.1.0
57 commits since v1.0.1. Six new language indexers, twenty-plus CLI commands and flags, a pre-skill hook that cuts repeat skill loads from 40–65k tokens to ~400, pnpm/yarn/bun compress filters, rg/grep dedup hints, double-daemon prevention, and a reliability pass with 400+ new tests.
Highlights
- Skill re-load prevention. A new
PreToolUse(Skill)hook fires before every Skill invocation. When a skill was already loaded in the current session, the reload is blocked and the cached compact (~400 tokens) is served instead. A repeat/ralphor/supermaninvocation in the same session now costs ~400 tokens, not 40–65k. - New language indexers. CSS/SCSS, SQL, GraphQL, Protobuf,
.env, and Makefile. All participate intoken-goat symbol,read,outline,scope, and dedup hints. - New CLI flags.
symbol --context N,symbol --json,outline --min-lines,outline --max-depth,web-output --list,map --filter,stats --since,token-goat recent, bash history exit codes. - Package manager filters. pnpm, yarn, and bun compress filters.
pnpm run/yarn runroute through their own filter. rg/grepdedup. Bash rg/grep invocations now fire dedup hints the same way the native Grep tool does.- Top-5 file guarantee. The five most-accessed files always appear in the compaction manifest.
- Double-daemon prevention. JSON PID files, cross-interpreter startup guard,
worker --kill-duplicate,worker --status,install --check.
Full changelog: https://github.com/DFKHelper/token-goat/blob/main/CHANGELOG.md
Assets 2
v1.0.1
Bundles two 50-commit improvement runs: a skill-cache / context-savings accuracy loop and a general quality loop.
Highlights:
- Skill cache: source_sha stale-compact detection, separate compact/body eviction buckets, sidecar schema v2, lazy skill injection, gzip compression
- Stats accounting fixed for
bash_output_cached,skill_cached,web_output_cached, and surgical-read lookup savings (were always 0) - Serve-diff-on-reread, session-hint cooldown, unified token formula, stats category grouping
- RuffFilter and MypyFilter bash-compress support
- Type safety, error handling, performance (hoisted regex), security (0o600 lock files), DRY helpers, debug log coverage
- 55 new tests
See CHANGELOG for full details.