-
-
Notifications
You must be signed in to change notification settings - Fork 258
Releases: SafeRL-Lab/cheetahclaws
What Changed
99a3b4a - June 5, 2026 (v3.05.82) (latest): User-controllable token / cost budgets — set a spend cap; on hit the session auto-saves and you can resume or raise it. The quota engine (
quota.py: per-session + per-day token/cost counters, enforced before each model call) already existed but had no friendly surface — you had to know four config keys (session_token_budget/session_cost_budget/daily_token_budget/daily_cost_budget) and there was no way to see how close you were, no warning before the wall, and the hard stop printed a bare[Quota exceeded]. This adds the UX layer on top of the unchanged engine: a/budgetcommand — no args shows usage vs every budget as colored bars + percentages;/budget 5ドルsets a session cost cap (the$means USD),/budget 200ka session token cap (parses200k/1.5m/200000),/budget daily 20ドル//budget daily 2mthe daily caps, and/budget clearremoves all. A--budget 5ドル/--budget 200kstartup flag sets the session cap at launch. Proximity warnings fire at the end of any turn that crosses ≥80% (yellow) / ≥95% (red) of a cap, so the wall never arrives by surprise. On hit the agent now yields aQuotaPauseevent (instead of a plain text line): the REPL auto-saves the session (session_latest.json+ daily backup, the same path/resumereads) and prints a friendly next-steps block — raise the same cap or remove it (/budget clear) then resend, or restart later and/resume. So a long task that runs out of budget is never lost: you analyze, adjust, and continue. Tight enforcement (no surprise overshoot): the check projects the next request's input (compaction.estimate_tokens) and stops before the call if it would cross the cap, and clamps that call'smax_tokensto the remaining headroom (quota.output_room) — so a single tool-heavy turn can't blow 40k→49k past the budget the way a pure "already-spent ≥ limit" check let it. One budget per scope: setting a cap replaces the other unit for that scope (/budget 5ドルafter/budget 200kswitches the session cap to cost rather than stacking), so a leftover token cap can't silently keep blocking after you switch to a$cap. Unit-matched hint:QuotaExceeded/QuotaPausecarry which cap broke (key/scope/unit/limit), so the "raise it" suggestion is in the right unit — a token cap shows/budget 40k, a daily cost cap shows/budget daily 40ドル— instead of a generic$amount that wouldn't lift a token cap. New helpersquota.parse_budget/fmt_amount/usage_vs_limits/warnings/output_room; command incommands/core.py:cmd_budget;QuotaPauseinagent.py; REPL handling +--budgetincheetahclaws.py; 42-casetests/test_budget.py(isolated quota dir, incl. a regression that the hint matches the breached unit and that switching units clears the stale cap). The daemon's conservativeserve-mode defaults (200k tok / 2ドル per session, 2M / 20ドル per day) are unchanged — interactive stays unlimited by default, the server stays guard-railed. See docs/guides/features.md · docs/guides/reference.md. - June 5, 2026 (v3.05.82): Adaptive Markdown streaming — live output that stays correct on every device. In-place Rich Live redraw is great on capable terminals but breaks elsewhere: it was disabled wholesale over SSH (so SSH users got raw tokens with no formatting), and where it did run it could leave duplicate or stale frames — on macOS Terminal (which can't erase above the scroll boundary), over laggy network PTYs, or with wide CJK / emoji text whose display width a naive line-count gets wrong. The renderer now selects a streaming tier per device in
ui.render.auto_stream_mode(config):live— full in-place redraw, only on terminals known to handle cursor-up (local TTYs, and modern emulators even over SSH: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty, detected viaTERM_PROGRAM/TERM/WT_SESSION/KITTY_WINDOW_ID/ALACRITTY_WINDOW_ID/WEZTERM_PANE);commit— append-only progressive Markdown, the safe default for unknown-SSH / Apple Terminal / pipes / non-TTY, where each completed block (split on blank lines, respecting open code fences so a fenced block renders atomically) is rendered and printed permanently and the cursor is never moved, making a duplicate frame structurally impossible regardless of terminal, latency, or character width;plain— raw tokens, only whenrichis unavailable. The append-only floor is provably duplication-free;liveis progressive enhancement on top. Override with/config stream_mode=live|commit|plain(legacy boolean/config rich_live=true|falsestill works →live/commit). Implemented inui/render.py(set_stream_mode/auto_stream_mode/_safe_commit_point/_commit_stream/_commit_flush), wired in at REPL start incheetahclaws.py, with a 26-case test suite intests/test_stream_modes.py(device routing, code-fence-aware block boundaries, append-only commit, and a regression asserting commit mode emits zero cursor sequences even on a TTY with CJK text). Two related UX items shipped alongside:/contextis now a visual grid — a Claude-Code-style ×ばつ10 cell grid of context-window usage, colored and broken down by category (system prompt / system tools / memory files / skills / messages / free space) with per-category token counts and percentages, adapting to the model's real context window and falling back to#/.on non-UTF-8 terminals (commands/core.py:cmd_context); anddeepseek-v4-flashis registered at its 1M context window inproviders._MODEL_CONTEXT_LIMITS(overriding the 128K deepseek provider default, which still applies todeepseek-chat/deepseek-v4-pro), so the prompt%,/context, and the compaction trigger all reflect the true 1M window. See docs/guides/features.md · docs/guides/reference.md.
Assets 2
What Changed
cb282bf -
June 4, 2026 (v3.05.81) (latest): Claude-Code-style quiet output — hide tool execution, show one summary line per turn. Long analysis turns used to scroll the terminal with a
⚙ Bash(...)line and a✓ → N lines (... chars)line for every tool call, and the permission prompt dumped the entire inline script (e.g. a 60-linepython3 << 'PYEOF'heredoc). A new quiet mode (on by default) suppresses the per-tool lines — the spinner conveys live activity and a single summary line is emitted at the tool→text boundary, sitting just above the reply (Read 2 files, ran 3 shell commands), the way Claude Code does. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the permission prompt also collapses a multi-line command to one line (Run: python3 << 'PYEOF' ... (+59 行)) instead of printing the whole script./verboseoverrides quiet (full per-tool lines + inputs + token counts); toggle with/quiet, or launch with--show-tools(alias--no-quiet). The startup banner gains anOutput: quiet/Output: fullline so the active mode is visible at a glance. Live status line: the spinner now shows elapsed time plus a running output-token estimate (Thinking... (7s · ↓ 435 tokens)) — char-based, since providers only report real usage at the end — and each quiet turn closes with a real-usage footer✻ Worked for 7.2s · ↑ 1.2k · ↓ 435built from the trueTurnDonecounts. Implemented inui/render.py(turn-level tool accumulator +turn_summary_line(), spinner token meter,print_turn_stats()), wired through the REPL event loop incheetahclaws.py, with the/quiettoggle incommands/config_cmd.py. See docs/guides/features.md. -
June 4, 2026: Context-window override — the prompt % and compaction now follow a settable context length. The prompt's context-usage
%(and the compaction trigger) derive from the model's context window, which previously could only be a hardcoded provider default — andmax_tokens(the OUTPUT cap) doesn't change it, so/config max_tokens=...left the%unchanged (a common point of confusion). New per-session keycontext_window(/config context_window=<N>,0= model default) overrides it, kept deliberately distinct frommax_tokens. A single parser (providers.context_window_override) feeds the prompt%,/context, the compaction trigger, and the per-call output-token cap, so all four stay consistent; it is bidirectional — a smaller value forces earlier compaction, a larger value corrects a stale default. The value is read live each prompt, so switching model orcontext_windowupdates the%with no restart./configwarns when the value exceeds the model's real window (which would disable compaction and let the API reject oversized prompts). No-op when unset, so existing behavior is unchanged. See docs/guides/reference.md. -
June 4, 2026: Rich Live streaming — long responses stay live via a bounded tail window. Large streamed responses that would overflow the terminal's redraw area could leave duplicate or stale frames behind on some emulators (macOS Terminal, etc.), because Rich Live redraws the whole accumulated output in place and the cursor can't reach content that has scrolled into the scrollback. Building on the per-response fallback from PR #133, Rich Live now keeps the live region bounded to the viewport: a short response is shown in full, but once it would overflow, only the last screenful of rendered lines (a tail window) is redrawn — so the Live region can never exceed the terminal and cannot leave stale frames. The complete output is committed once when the response finishes (including on Ctrl-C, since the REPL flushes on interrupt), so the head that scrolled out of the window is never lost. Plain streaming is kept only as a safety net (precise render failed, or the terminal is too small to bound a window). A cheap per-line wrap estimate short-circuits the expensive full
render_lines()measurement while a response stays well under the limit, so normal responses pay no extra Markdown re-render per chunk. Adds focused tests covering full-frame streaming, the full→tail transition, tail-window commit-on-flush, realSegmentsrendering, and both safety-net fallbacks. See docs/guides/features.md. -
May 31, 2026: QQ bot bridge —
/qqconnects cheetahclaws to QQ groups + C2C private chats (PR #121). Uses the officialqq-botpyWebSocket + HTTP SDK (pip install "cheetahclaws[qq]"). botpy's async client runs on a dedicated asyncio event loop inside a daemon thread, bridged to the synchronous main thread via thread-safe queues. Handleson_group_at_message_create(group @-mentions, prefix stripped) andon_c2c_message_create(private). Since QQ has no message-edit API, replies stream as new messages every ~2 s (2000-char chunking) instead of updating a placeholder; passive replies reference the originalmsg_id/event_idwithin QQ's 5-minute window, then fall back to active pushes. Per-target FIFO job queues, slash-command passthrough,!jobs/!retry/!cancelremote control, image input, and permission prompts scoped to the originating chat (no cross-chat approvals). A supervisor reconnects with exponential backoff (2 s → 120 s). Secret handling matches the hardening standard below:$QQ_SECRET(recommended) > REPL arg (deprecated, warns + scrubs history) > config; env-supplied secrets never touch~/.cheetahclaws/config.json./qq <appid>,/qq,/qq stop|status|logout. Two follow-up fixes over the original PR: image downloads moved off the event loop intoloop.run_in_executor(a blockingurlopenwould freeze the WebSocket heartbeat for up to 30 s), and the secret no longer gets written to disk unconditionally. See docs/guides/bridges.md.
Assets 2
What Changed
60e1eff -
May 12, 2026 (v3.05.80): (latest, security-hardening branch): Two-round security hardening sweep — CRITICAL + HIGH findings from the in-repo code review. Lands a cluster of fixes that close real attack surfaces opened by the recent rapid feature growth. Zero regressions across the full 2347-test suite.
Bot tokens off
argv/ readline history.cmd_telegramandcmd_slacknow accept a single-arg form (/telegram <chat_id>//slack <channel_id>) and read the bot token from$TELEGRAM_BOT_TOKEN/$SLACK_BOT_TOKEN. Env-supplied tokens never get persisted to~/.cheetahclaws/config.json; only tokens that actually came in via the deprecated REPL-arg path are saved on disk. Newbridges.scrub_token_from_history(token)walksreadline.get_history_itembackwards and removes any in-memory entry that embeds the token the moment we know its value. Bridge supervisors get atoken=/channel=kwarg so the env-sourced token can flow to the worker thread without ever sitting on the config dict —_slack_start_bridge(config, *, token, channel). Telegram already passed the token explicitly to_tg_supervisor. WeChat is unaffected (QR-scan token, never in argv).Web UI CSRF — double-submit cookie. Server mints
ccsrf=<24B>; Path=/; SameSite=Strict; Max-Age=86400(non-HttpOnly) on every connection that arrives without one._handle_connectiongates POST/PUT/PATCH/DELETE on a matchingX-CSRF-Tokenrequest header (rejection:403 csrf token mismatch). Exempt:/api/auth/{bootstrap,register,login,logout,api/auth}— they establish the session that later carries the cookie. Newweb/static/js/csrf.jsmonkey-patcheswindow.fetchso every state-changing request automatically echoes the cookie value; loaded as the first script inchat.html, the inline terminal script in_build_html, andlab.html. Test harness (tests/test_web_api.py:_client) gains anhttpxevent hook that mirrors the browser behaviour. SameSite=Strict on the JWT cookie remains the first-line defence; CSRF is the second line.Web terminal session ownership.
_PtySession(owner_uid=...)records the creator's JWTsubat/api/sessiontime._check_pty_owner(session, cookie)is consulted at/api/stream//api/input//api/resize— any other authenticated user trying to reach a knownsidgets403 not session owner. Password-only mode (no JWT) keepsowner_uid=Noneand skips the check, preserving the shared-secret model. Closes the trivial-sid-hijack hole in multi-user web deployments.Bash hard-denylist. Eight regexes in
tools/shell.py:_BASH_HARD_DENYrefuse host-destroying patterns regardless ofpermission_mode—rm -rf /and its--recursive/--forcevariants,rm -rf /*,mkfs.*,dd of=/dev/{sd,hd,nvme,vd,mmcblk,xvd},> /dev/{sd,hd,...},chmod -R 777 /,chown -R <user> /, and the classic:(){ :|:& };:fork bomb. Hits the Bash tool, the REPL!cmdescape, and all three bridges'!cmdpaths. Plus NUL-byte + control-char + 64 KB length rejection on every Bash invocation.Filesystem credential denylist.
tools/security.py:_check_path_allowednow refuses access to a small denylist by default — SSH private keys (~/.ssh/id_*),~/.aws,~/.gnupg,~/.kube,~/.docker,~/.netrc,~/.pgpass,/etc/shadow,/etc/gshadow,/etc/sudoers*,/root. Public-by-convention SSH files (config,known_hosts,authorized_keys) remain readable. SetCHEETAHCLAWS_FS_NO_SANDBOX=1to bypass when intentionally auditing your own secrets. Independent ofallowed_root, which still works as the strict-mode toggle for multi-user daemon deployments.Plugin loader hardening. Two new env switches in
plugin/loader.py:CHEETAHCLAWS_DISABLE_PLUGINS=1(kill switch) andCHEETAHCLAWS_PLUGIN_ALLOWLIST=a,b,c(whitelist). EXTERNAL-scope plugins (loaded via$CHEETAHCLAWS_PLUGIN_PATH) print a one-time stderr warning on first load so a stolen env-var-set doesn't silently execute. Module path resolution now usesPath.resolve()+relative_to(install_dir)to confine a malicious manifest's"tools": ["../../etc/passwd_loader"]style entry.MCP env sanitisation.
cc_mcp/client.py:_sanitized_mcp_envstrips a fixed set of process-hijack keys (LD_PRELOAD,LD_LIBRARY_PATH,LD_AUDIT,DYLD_INSERT_LIBRARIES,DYLD_LIBRARY_PATH,PYTHONPATH,PYTHONSTARTUP,PYTHONHOME,PYTHONEXECUTABLE,NODE_OPTIONS,NODE_PATH,BASH_ENV,ENV) from anyenvmap an.mcp.jsonconfig supplies. Dropped keys print a one-line stderr notice. Bypass:CHEETAHCLAWS_MCP_TRUST_ENV=1. Closes a real local-priv-esc path on a host with multiple MCP server configs of varying trust.macOS daemon peer-cred.
cc_daemon/auth.py:get_peer_uidnow branches onsys.platform: Linux keepsSO_PEERCRED, macOS / *BSD goes through ctypes-loadedgetpeereid(2). Closes a long-standing TODO that effectively reduced macOS Unix-socket auth to token-only (a stolen daemon-token implied full RCE without peer-uid validation).Smaller fixes folded in. Web JWT secret loader rewritten with
O_CREAT \| O_EXCL+ 0o600 + post-write mode verification (refuses to read a world-readable secret file; auto-falls-back to in-memory secret if chmod can't be enforced; override withCHEETAHCLAWS_WEB_SECRET). Terminal one-time password fromsecrets.token_urlsafe(6)[:6](~30 bits, online-bruteable) tosecrets.token_urlsafe(32)(~190 bits).cc_config.save_configstripspermission_mode=accept-allbefore persisting — once-confirmed escape hatches no longer outlive the session that set them.session_store.save_sessionwrapped in a module-levelLock+ explicitBEGIN IMMEDIATE/ROLLBACKso two threads writing the samesession_idno longer silently drop one set of changes.agent_runner.pyerr_msginitialised before the try block (defends against aNameErroron first iteration if_handle_permission_requestreturns"error");quota.QuotaExceededmatched byisinstanceinstead of class-name string.compaction.compact_messageswrapsstream_auxiliaryin try/except + falls back to the original messages instead of crashing the agent loop.providers._recover_args_from_textcaps the regex scan window to the last 32 KB of accumulated text (was scanning ~100 KB+ on every tool call).context.get_git_info+get_claude_mdget TTL caches (30 s / 10 s, keyed by cwd) so the per-turngit rev-parse / status / logand CLAUDE.md re-read stop showing up in profiles.cc_mcp/client.pyreader loops usedict.pop()instead ofin+index so a late response after a timeout doesn't race the request side.tool_registry._cache_keyaddssession_iddimension so aRead(/etc/...)cached for one session never leaks to another.session_store.search_sessionsLIKE-fallback path escapes%/_/\before interpolation.Frontend XSS audit. Existing
_esc(textContent-→-innerHTML) and_renderMd(HTML-tag-strip → marked) cover all user/model content paths. One deep-trust hole closed:web/static/js/settings.js:_renderModelspreviously injected server-supplied model names directly into anonclick="app.selectModel('${full}')"attribute — now usesdata-model+ a delegated click handler, so a malicious model registry entry cannot break out of the JS string literal.Defaults you can flip.
CHEETAHCLAWS_BRIDGE_TERMINAL=0hard-disables the bridge!cmdshell entirely (default1, owner-bound bychat_idwhitelist anyway).CHEETAHCLAWS_FS_NO_SANDBOX=1lifts the credential denylist.CHEETAHCLAWS_DISABLE_PLUGINS=1/CHEETAHCLAWS_PLUGIN_ALLOWLIST=.../CHEETAHCLAWS_MCP_TRUST_ENV=1control plugin + MCP behaviour. Full reference in docs/guides/security.md. All 12 CRITICAL + 10 HIGH items from the review now closed (4 of those 22 turned out to be review misjudgements —_all_errorsinit, permission double-answer race,_broadcastiter race, and theQuotaExceededclassname check was a real fix but the surrounding "shell injection in REPL!command" was reclassified as user-typed-input not RCE). Architecture refactor items (cheetahclaws.py/providers.pyGod-object split, sentinel state machine) deliberately left for a separate decision — they're shape changes, not bug fixes. -
May 12, 2026 (
daemon/f-4-followups-f-6-9branch): Daemon foundation roadmap finished — all nine F-1...F-9 items in RFC 0002 now LANDED. Closes the remaining four scope items end-to-end (≈1500 LoC of code + ≈900 LoC of tests + docs). Drilldown:F-4 #2 — Bridge
notifyforwarding. The subprocess-runner reader loop'snotifyIPC branch used to drop the payload on the floor (F-6/7/8 didn't exist yet). Now it routes throughcc_daemon.bridge_supervisor.notify(kind, text). The runner can target a specific bridge viamsg["bridge"](e.g."telegram") or omit it for a"*"broadcast.agent_runner_notifyevents on the bus carry{name, run_id, bridge, delivered, text[:500]}so observers can audit deliveries. Empty-text frames are silently dropped (common during agent shutdown).F-4 #3 — Restart policy. New
RestartPolicydataclass:mode(none|on-crash),max_restarts,backoff_base_s,backoff_cap_s,backoff_jitter_s. Frozen + a purenext_delay(restart_count)so the decision matrix is unit-testable.agent.startaccepts the five fields flat (validation rejectscap < basewhich would clamp every attempt down to a useless ceiling). On a crash the reader'sfinallyarms athreading.Timer(delay, _do_restart, ...); the Timer respawns via a swappable spawner hook (_RESTART_SPAWNERfor tests) and carriesrestart_countforward.stop()cancels the Timer before the kill ladder, and the same_unregister(name, expected=handle)identity check protects against a Timer-fired respawn racing past a deliberate stop. Bus events:agent_runner_restart_scheduled,agent_runner_restart, `agent_runner_restart_fai...
Assets 2
What Changed
d7fc51a -
May 10, 2026 (latest, v3.05.79): Web Chat UI session organization + headless-bridges slash handler + stale-session reaper crash fix. Three threads of work merged into a single release. Bridges / headless deploys (#84 follow-up): Telegram / Slack / WeChat
/help,/monitor,/model,/statusproduced zero response in Docker /--webdeploys because_start_headless_bridges()only wiredrun_queryandagent_stateon the sharedsession_ctx— neverhandle_slash. The bridge poll loops gate onif slash_cb:and fell through tocontinuebefore the📩 Telegram:log line, so the failure was invisible indocker compose logs -f. Fix: extracted the slash handler (originally inlined inrepl()) into a module-level factory_make_bridge_slash_handler(state, config, run_query); both REPL and headless paths now use it (single source of truth, no future drift between modes). Stale-session reaper crash:web/api.py:reap_stale_chat_sessions()calledremove_chat_session(sid)without theuser_idthe function now requires for ownership-check parity — every reaper tick raisedTypeError, killing the daemon thread, so staleChatSessionobjects accumulated forever in the in-memory cache. Fix: capture(sid, user_id)pairs from the cachedChatSessionobjects under_chat_lock, then apply outside the lock. Web UI session organization: five-feature bundle layered on top — folders + drag-drop + Move-to context menu, ChatGPT-style active-folder context (click a folder name →+ Newand direct-typing both drop new sessions into that folder, with aChat · in <Folder>topbar breadcrumb), batch select with Select-all-respecting-search-filter, batch delete + combined-Markdown export (chats-N-sessions.md), and a 4-px draggable sidebar divider with localStorage persistence. Backend adds afolderstable,chat_sessions.folder_idnullable FK, in-placePRAGMA table_info+ALTER TABLEmigration ininit_db(), and 5 new HTTP endpoints (GET/POST /api/folders,PATCH/DELETE /api/folders/{id},PATCH /api/sessions/{id}/folder). Also rolled in: issue #111 (handle_slash_sync/handle_slash_streamno longer double-broadcast to WS) and--web --model Xpersistence. Tests: +16 new acrosstest_web_api.py(folder CRUD, batch ops, reaper regression) and the newtest_bridge_slash_handler.py(5 cases pinning the headless handler contract). Full suite: 2154 / 2154 passing, zero regressions. User-side guide:docs/guides/web-ui.md. -
May 10, 2026: Web Chat UI fixes — slash commands no longer reply twice;
--web --model Xactually applies the model. Two related issues that surfaced when wiring a self-hosted vLLM endpoint into the Chat UI. (1) Issue #111 — slash commands duplicated in Chat UI but not in terminal.web/api.py:handle_slash_syncwas both returning events inline in the HTTP response and broadcasting the same events to the WS subscribers of the same client;chat.jsthen iterateddata.eventsAND fired_handleEventfromws.onmessage, rendering every reply twice. Same bug inhandle_slash_streamfor SSE-streamed long commands (/brainstorm,/worker,/agent,/plan). Both helpers now deliver events through a single channel — HTTP/SSE only — so_handleEventruns exactly once per event. Background-thread events (sentinel flows, agent runs) are unaffected: by the time the worker thread emits,_broadcastis already restored to the live WS broadcaster infinally. (2)--web --model Xwas silently ignored. The CLI override branch only ran in the interactive-REPL path; theif args.web:branch loaded config straight from disk and started the server, sopython cheetahclaws.py --web --model custom/qwen2.5-72bwould happily boot but every request handler reloaded~/.cheetahclaws/config.jsonwith the previous model name (e.g.gemma-4-31B-it), producing a confusing404: model does not existagainst the new endpoint. Fix:cheetahclaws.pynow persistsargs.modelto config before callingstart_web_server, matching the documented behavior;provider:model→provider/modelnormalization is identical to the REPL path. User-side guide:docs/guides/web-ui.md(Troubleshooting + Architecture notes updated). -
May 10, 2026: Small-context local models survive large workloads — 4-part fix: ctx cap, auto-fanout, stagnation-stop, output paths under
~/.cheetahclaws/. Repro that motivated the work: running/agent → 1 (Research Assistant)on a 6.6 MB PDF (AutoRedTeamer.pdf— ~70k tokens of extracted text) withcustom/qwen2.5-72b(32k ctx). Old behavior: 400 BadRequest "context length 32768"; the agent_runner kept polling the template every 2 s; the model produced 1500+ identical "task complete" summaries before anything stopped it. New behavior, four cooperating layers: (1) Per-model context-window registry + dynamic max_tokens cap (providers._MODEL_CONTEXT_LIMITS+get_model_context_window+dynamic_cap_max_tokens) — covers Qwen 2.5/3, Llama 3.x, Mistral/Mixtral, Phi, Gemma, DeepSeek local variants;_fetch_custom_model_limitnow backfillsPROVIDERS["custom"]["context_limit"]so compaction sees the live/v1/modelsvalue; per-call shrink based on actual prompt size keepsinput + output + 1024 safety ≤ ctx.compaction.get_context_limitgains an optionalconfigarg so custom-endpoint detection works on the very first turn. (2) Auto-fanout for oversize tool outputs (multi_agent/fanout.py) — when a single tool result (Read on a huge PDF, Grep over a giant tree, WebFetch of a long article) exceeds 0.4 ×ばつ ctx_window, split into chunks at paragraph boundaries with token-overlap, dispatch parallel sub-LLM map calls (one per chunk, default cap 5 subagents), merge with a single reduce call; substitutes the merged summary in conversation history instead of letting the next API call overflow. Hooked at the tool-result append site inagent.py; transparent UX prints[Auto-fanout: <Tool> returned ~N chars (>threshold) → dispatching K parallel sub-summaries]. Configurable:auto_fanout_enabled/_threshold/_max_subagents/_chunk_overlap_tokens. (3) Stagnation-stop inagent_runner.py— when the model emits the same summary N iterations in a row (default 3, whitespace/case-normalized), stop the loop with a clear notification instead of burning thousands of API calls; configurable viaauto_agent_dup_summary_limit(0 disables). (4) Agent output paths under~/.cheetahclaws/—/agentwizard now resolves relative output filenames (e.g.research_notes.md) to absolute paths under~/.cheetahclaws/agents/<name>/output/instead of CWD;AgentRunnerexposesrunner.output_dir, eagerly mkdir'd; Summary block + post-start info show the resolved path in green; absolute paths pass through unchanged. Tests: +47 new (fanout 23, ctx cap 18, dup-stop 13, output paths 8). Full suite: 2139 passing, zero regressions. User-side guide:docs/guides/extensions.md. -
May 9, 2026: Read tool auto-redirects on overflow — defense-in-depth for the case where model ignores the template instruction. Re-running the same
/agent + autodan.pdffailure showed two real-world problems with the prior fix: (1) The user was running the pip-installed binary (/home/shangdinggu/anaconda3/bin/cheetahclaws), not the source tree. New tools / templates added to source had no effect. (2) Even if the user reinstalled, qwen2.5-72b would likely still callReadinstead ofSummarizeLargeFile— models default to familiar tools no matter what the template says. The fix moves the routing decision into the Read tool itself. (a) New_maybe_redirect_to_summarizehelper (tools/files.py). WhenReadorReadPDFwould return content too large to safely fit in the next API call, it instead returns a short redirect message like[ReadTooLarge: file is too large — call SummarizeLargeFile with file_path='X' instead] PREVIEW: .... The model sees the redirect, callsSummarizeLargeFile, gets a chunked-and-merged summary back. The raw content never enters the API call. (b) CJK-aware token estimation. CJK content tokenizes at ~1 token per character (vs ~2.8 chars/token for English). New_is_cjk_heavy()heuristic: ≥20% CJK characters → use 1:1 char-to-token estimate. A 24K-char Chinese file is 24K tokens, not 8.6K, and now triggers redirect on a 32K-context model. (c) Conservative ceiling for unreliable provider declarations.custom/<model>provider declares 128K context by default but the underlying model is often 32K (qwen2.5-72b, llama 3 8B, etc.). Newsafe_ctx = min(declared_ctx, 30000)caps the threshold at 30K tokens regardless of provider claims — the redirect now fires on the user's exact ~25K-token PDF case (would NOT have fired with the unconditional 128K ceiling, which is exactly the bug). (d) Wrapped Read registration (tools/__init__.py). New_read_with_overflow_checklambda calls_maybe_redirect_to_summarizeafter_readreturns; for results <8KB it skips (not worth the check). ReadPDF gets the same treatment inline in_read_pdf. Why this works even on the old install: as soon as the user updatestools/files.pyandtools/__init__.py, the redirect fires regardless of whether SummarizeLargeFile / template changes are present. The redirect's prose tells the model exactly which tool to call and with what args. Tests: 14 new pytest cases (tests/test_read_overflow_redirect.py) — CJK detection (English / Chinese / Japanese / mixed-minority / empty), threshold logic (small file → no redirect; user's exact failure case → redirect with right pointer; CJK at lower char count triggers vs same chars in English; conservative ceiling protects against overconfident provider; preview included for context). Plus 2 integration tests viaexecute_tool("Read", ...)confirming the wrapper applies the redirect ...
Assets 2
What Changed
61e618d -
May 8, 2026 (v3.05.78): May 8, 2026: F-2/F-3 follow-ups + CI unblock (
feature/fix-f2). Main has been red since9c01237d(the trading-agent #99 merge) becausetests/test_packaging.py::test_required_module_imports[modular.trading.ml](issue #97 regression test) caught thatmodular/trading/ml/features.pyandmodular/trading/portfolio.pyimport numpy at module top while numpy is in the[trading]extra —pip install .shipped a broken wheel and #100 / #101 inherited the red. Two-commit fix on top of #101: (a)fix(ci)— drop the dead numpy import fromfeatures.py; defer numpy to insidestacker.py:train()/predict_proba()past their early-return paths; gateportfolio.py's numpy behindtry/except; addpytest.mark.skipifon the optimizer / managed-portfolio / ML-training / factor-scan tests so lean-install CI skips them cleanly. Verified: clean venv with only[web,autosuggest](the exact CI install) 1075 passed, 11 skipped; with full extras 1086 passed, no regressions. (b)fix(daemon)— five F-2/F-3 follow-ups: movemonitor.scheduler.start(...)past the listener bind incc_daemon/cli.py:cmd_serve(so a misconfigured fetch/deliver can't fail before the daemon is reachable); add_foreign_daemon_running()step-aside check at every scheduler loop tick to close the race where REPL/monitor startfires before the daemon writes its discovery file (both schedulers would otherwise race onlast_run_at); flipcc_daemon/schema.pytoPRAGMA synchronous=NORMAL(safe under WAL, ×ばつ fasterEventBus.publish— 305 μs/event → 39 μs/event, important for streaming agent output); clarify injobs.py/monitor/store.py/docs/architecture.mdthat the JSON→SQLite migration is one-way (PR #101's wording implied a fallback read path that doesn't exist); updatedocs/RFC/0002-daemon-foundation-roadmap.mdF-2/F-3 status from OPEN → MERGED. Branch:feature/fix-f2. -
Research lab Phase A — autonomous multi-day research; WeChat smart-reply +
/draftsemi-auto reply; reliability + UX hardening across the lab pipeline. Two big surfaces shipped together: (a) the research lab is no longer single-shot —/lab resume <run_id> [<stage>]reconstructsLabStatefrom SQLite to continue or rewind a run;/lab iterate <run_id>runs a 3-reviewer self-review on the final report (novelty / rigor / clarity / evidence, 1-10), routes the lowest-scoring dimension to the corresponding stage (novelty→QUESTIONING, rigor→IMPLEMENTATION, clarity→DRAFTING, evidence→EXPERIMENT), rewinds + re-runs, loops untiltarget_score/max_iterations/ plateau / budget;/lab backlog add <topic> --iterate --target=N --max=N --prio=Nqueues many topics,/lab daemon startruns them 24/7 in a single-worker loop with crash-recovery (reset_running_backlogunsticks stale rows on next start);/lab modelsprints the effective per-role model + which API key drove each pick + warns when reviewers span <N families (homogeneous review = no meta-loop signal);/lab migrate-paths [--apply]renames legacylab_xxx/output dirs to the human-readable<date>_<time>_<topic-slug>_<run_id_short>form (e.g.2026年05月08日_14-30_post-transformer-architectures-survey_b16036de/). (b) WeChat smart-reply panel — when a whitelisted contact sends an inbound message, an auxiliary cheap model drafts 3 candidate replies and pushes them as a panel to yourfilehelper(文件传输助手); reply with1/2/3/AA 1to send, freeform text to customise,xto skip,qfor queue. SQLite-persisted at~/.cheetahclaws/wx_smart_reply.db(in-memory fallback on init failure); contacts JSON at~/.cheetahclaws/wx_contacts.jsonis mtime-hot-reloaded; bot-owner self-uid is auto-recorded on first inbound and excluded from smart-reply unconditionally, so your own messages always reach the agent regardless of whitelist contents. (c)/draft <message>slash command — semi-automatic reply suggestion path for cases where the bot can't intercept the inbound directly (bot account ≠ user main account on iLink ClawBot). 3 candidates drafted via the auxiliary model, optionally tone-conditioned via@<contact_uid_or_label>againstwx_contacts.json; when invoked from a bridge channel (WeChat / Telegram / Slack), candidates are also echoed back to the originating uid + stashed inbridges.draft_cacheso a digit-only reply (1/2/3) consumes the chosen text one-shot, no agent invocation, no smart-reply panel triggered. Reliability hardening on top of #88's MCP work:research/http.pynow uses 429-aware backoff (10/30/60/120s vs 0.5/1/2/4s for 5xx) and honoursRetry-Afterheaders (capped at 180s); the lab surveyor stage grounds in realresearch.aggregator.research()hits before invoking the LLM (top-30 academic+tech results passed as context, persisted assurvey_search_hitsartifact for replay) — fabricated-citation rate drops sharply on tested topics;_dedupe_self_repeat()trims cheap-model degenerate sampling (text == text+text) before storage so reviewer prompts don't see doubled inputs;_extract_numbereddedupes by content (questioner emitting1..5\n1..5keeps 5, not 10); the citation verifier now has a per-citation 30sconcurrent.futureshard wall-clock (kills slow-loris sockets that urllib's socket-timeout ignores) + a 5-min stage-level cap with progress callbacks surfaced to/lab logs(the 11-min hang we saw in the field is gone). REPL ergonomics:/lab daemon startand/lab startnow print the eventual report.md path up front + live-stream stage transitions to the terminal as they happen;/lab status <run_id>shows both new + legacy paths so the user can find old reports too;/configparses JSON-style values (lists, dicts, signed numbers, quoted strings) —/config wechat_smart_reply_whitelist=["wxid_..."]no longer silently saved as a literal string; leading whitespace before/is now stripped before slash-dispatch (so a paste with a stray space still hits the dispatcher, not the agent). Tests: 884 passing (842 unit/integration + 22 e2e), zero regressions; ~80 new pytest cases covering iteration scoring, state reconstruction, backlog atomicity, verifier hard-timeout, slug edge cases, dedupe patterns, self-uid bypass.**
Assets 2
What Changed
530dbe7 - May 7, 2026 (v3.05.77): MCP HTTP/SSE transport + OAuth 2.0 PKCE,
.envloader,ANTHROPIC_ENDPOINTcorporate-proxy override, AskUserQuestion UI polish (#88, #89) —cc_mcp/client.pynow speaks Streamable HTTP (POST →text/event-streamreply) in addition to stdio and pure SSE, with theAccept: application/json, text/event-streamheader servers like sap-jira require to stop 406-ing. OAuth 2.0: newcc_mcp/oauth.pyimplements the full MCP Authorization spec — RFC 9728 resource-server discovery → RFC 8414 AS metadata → RFC 7591 dynamic client registration → Authorization Code + PKCE (S256) flow with browser redirect → automatic refresh-token rotation. Tokens persist atomically to~/.cheetahclaws/mcp_oauth.jsonat mode0600with the parent directory locked to0700. The redirect-URI port is picked once and reused for both registration and the local callback server, the OAuth scope is sourced from the AS's advertisedscopes_supported(preferringmcpif listed, otherwise the first one, otherwise omitted entirely so servers without anmcpscope no longer reject withinvalid_scope), and_ensure_oauth()is guarded by a dedicated lock so concurrent 401-retries can't race on the httpx client rebuild. REPL:/mcp add <name> --transport http <url>and/mcp add <name> --transport sse <url>for one-line HTTP server registration; explicit/mcp listsubcommand with full-width tool descriptions wrapped at 72 cols. Server name sanitization: hyphenated names likegithub-toolsnow resolve correctly through themcp__server__toolqualified-name path..envloader:_load_env()runs at the very top ofcheetahclaws.pybefore any other import readsos.environ, so.envkeys are visible to every module without losing existing-shell-var precedence (os.environ.setdefault). MCP HTTPheadersvalues are passed throughos.path.expandvars, so"Authorization": "Bearer $GITHUB_TOKEN"works out of the box.ANTHROPIC_ENDPOINTenv var (also reachable via.env) overrides the persistedanthropic_endpointconfig and is used by both the streaming Anthropic client (providers.pypassesbase_url=...toanthropic.Anthropic) and the connectivity probes in/doctor/ setup wizard, letting corporate proxies swapapi.anthropic.comcleanly. UI:AskUserQuestionis auto-approved alongsideEnterPlanMode/ExitPlanMode(it's an interactive tool by definition, a permission prompt was redundant), the spinner and result line are suppressed inprint_tool_start/end, the question text is rendered throughclr()with Markdown stripped (**bold**,`code`,*italic*), and option indices/descriptions are colorized. The REPL prompt now prints a full-width─rule viaos.get_terminal_size()(80-char fallback) before each input, matching Claude Code's visual rhythm.** - May 5, 2026: Telegram bridge file round-trip + cross-channel pickable permission prompts (#84) —
bridges/telegram.pypreviously only had_tg_send(text viasendMessage), so when the model claimed it had "sent a file" it was just text and the[approve][reject]text in permission prompts only looked like buttons. Added_tg_send_document(multipart/form-data upload, 49 MB cap with explicit oversize/empty/missing/network/API-rejection error reporting), an inbounddocumenthandler that saves uploads to/workspace(ortempfile.gettempdir()outside Docker) with sanitized filenames and a path-aware prompt, a!sendfile <path>user command for explicit on-demand sends, and an auto-send hook in_bg_runnerthat mails any file written by theWritetool — FIFO-paired with the in-flightfile_path, skipped onError:/Denied:results, and de-duplicated per turn so parallel writes don't double-mail. Cross-channel permission UX:ask_input_interactive(options=[(label, value), ...])now renders an interactive picker on every bridge — Telegram gets a realinline_keyboard(callback_data="cc:<prompt_id>:<value>",_handle_callback_querydoes auth + stale-prompt-id drop +answerCallbackQuery+editMessageText "✓ Selected: y"), Slack and WeChat get a numbered menu in the message body (reply with digit / canonical letter / label word — all resolve via_resolve_choice), terminal prints the same numbered menu before the input cursor;ask_permission_interactivepasses[(✅ Approve, y), (❌ Reject, n), (✅✅ Accept all, a)]. Backward-compatible: every existingask_input_interactivecall site (nooptions=) keeps free-text behavior. 49 new pytest cases (tests/test_telegram_bridge.py+tests/test_options_menu.py) — no real network calls. 718 passed, zero regressions on the 669 pre-existing.--accept-allwas a red herring; the bridge simply lacked the upload code path. - May 2, 2026: Docker chat UI assets 404 follow-up (#73) —
web/server.pynow resolves_WEB_DIRviaimportlib.resources.files("web")instead ofPath(__file__).parent, so static files are found whether the package is installed editable or non-editable. The dotfile guard in the static-file branch now only inspects path segments inside_WEB_DIR, so installs sitting under.venv/,.local/, etc. no longer 404 every asset.[tool.setuptools.package-data]forwebwidened tostatic/**/*so non-editable wheels reliably ship the fullweb/static/subtree. Plus a newdocs/guides/docker.md"Custom Dockerfile pitfalls" section covering the editable-install requirement and the most common 404 root cause for users rolling their own image. - Apr 30, 2026: Docker / home-server support (#73) —
Dockerfile,docker-compose.yml,.env.example, host Ollama viahost.docker.internal, workspace bind-mount for Samba sharing.--webmode now auto-starts configured Telegram / WeChat / Slack bridges in the same process so a single container delivers browser UI + phone bridge. Plus two terminal/agent fixes:AskUserQuestionno longer deadlocks the terminal (#69) — synchronous render+read instead of a queue/event the agent thread can't drain.messages_to_openaiemitscontent: ""instead ofnullfor tool-only assistant turns so Ollama's OpenAI-compat endpoint stops 400-ing withinvalid message content type: <nil>; 400 /BadRequestErrorreclassified as a non-retryableINVALID_REQUESTso a malformed body no longer trips the circuit breaker (#71). - Apr 24, 2026: Support Deepseek V4 models, multi-model prompt adaptation — single shared
default.mdbaseline + tiny per-family overlays (Anthropic XML tags · Gemini 3 explicit Agentic Mode · OpenAI o-series no-narration). Routing is by model family, not provider/runtime — same Qwen prompt whether served via DashScope, Ollama, or OpenRouter. Overlays must cite a vendor prompting guide (≤ 20 lines, enforced by tests). DeepSeek v4 thinking-mode protocol (reasoning_contentround-trip +thinking: ONby default). fix(setup-wizard): tolerate api_key_env=None for ollama/lmstudio (#59)
Assets 2
v3.05.76.5: Merge pull request #74 from mxh1999/daemon-design-note
What Changed
00ab4c5 - Apr 20, 2026 (v3.05.76): Research pipeline — 20 sources, time-range filter, cross-platform heat table, citations analysis, saved reports, Chinese platforms (B站 · 微博 · 小红书 · 知乎), /monitor trend-tracking, one-click
/ssjwizard, entity extraction, multi-query expansion, side-by-side compare/research <topic>— fans out to 20 sources in parallel: arXiv · Semantic Scholar · OpenAlex · HuggingFace Papers · alphaXiv · Google Scholar · HackerNews · GitHub · Reddit · StackOverflow · Google News · Polymarket · SEC EDGAR · Tavily · Brave · Twitter/X · 知乎 · B站 · 微博 · 小红书. 13 sources work zero-config; 7 optional (need keys or cookies).- Engagement-weighted ranking — each source's native signal (HN points, GitHub stars, Reddit upvotes, citations, HF upvotes, B站播放, 微博赞, 小红书赞, Twitter likes, Polymarket USD volume) is log-normalized against a per-source calibration to a shared 0-1 scale. Blended with a 14-day-half-life recency bonus. Cross-source dedup by URL keeps the highest-engagement entry on duplicates.
- Time range filter —
--range 1d|3d|7d|14d|30d|60d|90d|6m|1y|2y|5y|all(or natural30days,6months,2years) and explicit--since YYYY-MM-DD --until YYYY-MM-DD. Each source translates the window to its native filter: arXivsubmittedDate:[...], Semantic Scholaryear=LO-HI, OpenAlexfrom_publication_date:..., HNnumericFilters=created_at_i>..., GitHubpushed:>..., Redditt=hour|day|week|month|year|all, StackOverflowfromdate=/todate=, Google Newsafter:/before:, SEC EDGARdateRange=custom, Tavilystart_published_date, Bravefreshness=pd|pw|pm|py, Twitter v2start_time/end_time, Google Scholar client-side year filter, HuggingFace / Bilibili / Weibo client-side. Polymarket and Zhihu have no date filter API and are documented as exceptions. - Cross-platform attention table — every brief renders a Markdown table: per-platform result count · top engagement label · median result age · domain. Skipped/failed sources appear too with clear reasons. The LLM synthesis prompt copies this table verbatim and adds 2-3 sentences comparing attention distribution (academic-heavy vs. social-heavy vs. news-heavy).
- Publication trend sparkline + 12-month bar chart — a compact Unicode sparkline (
▁▂▃▄▅▆▇█) across the last 24 months in the brief header; a full per-month bar chart lower down. Built from ALL dated results across academic/news/social sources, giving a single-glance view of where the buzz has moved. - Notable-citer analysis (
--citations) — secondary Semantic Scholar calls on top academic results, pulling citing-paper authors and filtering to those with ≥10k total citations (configurable via--citation-threshold). Surfaces a table with name · affiliation · total cites · h-index · which papers they cited. Adds 2-10 API calls per run; recommended to pair withSEMANTIC_SCHOLAR_API_KEYto escape the anonymous 100-req/5-min limit. - Entity extraction — offline, zero-LLM pattern-matching that scans every pulled result for frequent named entities across four categories: models (GPT-5, Claude-Opus-5, Llama-4, Gemini-2.5-Pro, GLM-5.1, Qwen-3, DeepSeek-V3, Grok, Mistral, Phi, Yi, Kimi, ...), benchmarks (MMLU, MMLU-Pro, GSM8K, MATH, HumanEval, HumanEval+, SWE-bench, LiveCodeBench, MMMU, MathVista, GAIA, AgentBench, WebArena, Arena-Hard, FrontierMath, ARC-AGI, GPQA-Diamond, HLE, C-Eval, CMMLU, RULER, LongBench, ...), orgs (OpenAI, Anthropic, Google DeepMind, Meta, xAI, Mistral AI, DeepSeek, Moonshot, Alibaba, Zhipu, Tencent, ByteDance, Hugging Face, NVIDIA, 01.AI, AI2, Mila, Stanford, MIT, Berkeley, CMU, Tsinghua, ...), and people (from academic result author fields). Counts dedupe within a single result so one spammy abstract doesn't skew the ranking. Renders as a "Top mentioned entities" section directly beneath the heat table — one glance answers "what's everyone talking about?" without the LLM round-trip.
- Multi-query expansion (
--expandor--expand N) — asks the active model to propose 2-6 sibling subqueries (different angles — theory vs. tooling vs. industry deployment vs. controversy — not paraphrases), then runs each in parallel across all sources with proportionally reduced per-source limits. Results merge into the main pipeline (dedup + rank + synth). Example:/research --expand "frontier LLM benchmarks"auto-expands toLLM evaluation methodology,benchmark saturation and contamination,capability measurement frontier models,human preference benchmarks evaluation. Coverage jumps several-fold for broad topics. - Side-by-side compare —
/research compare "topic A" vs "topic B" [vs "topic C"]runs 2 or 3 independent research queries in parallel and produces a unified comparative brief: verdict at a glance · side-by-side heat tables · shared themes · unique strengths per topic · open questions. Citations use prefixed[A-N]/[B-N]/[C-N]markers so readers can trace every claim back to the right topic's evidence pool. Falls back to a deterministic no-LLM rendering with all three heat tables + entity tables when no model is configured. - Auto-save to
~/.cheetahclaws/research_reports/— every/researchand/research comparerun writes two files:<YYYY-MM-DD_HHMMSS>-<slug>.md(rendered brief) +.jsonsidecar (serialized Brief + notable citers + entities). Opt out with--no-save. Explicit export via--save-as PATH. New/reportscommand:list(50 most recent) ·open <id>(print) ·path <id>(print file path) ·delete <id>. - Weekly trend tracking via
/monitor— new topic prefixresearch:<query>(orresearch:<range>:<query>— e.g.research:30d:RLHF) dispatches to the full 20-source pipeline each scheduled run. Supportsdaily/weekly/12h/... schedules and--telegram/--slack/console channels. Each invocation: pulls all 20 sources · filters by the subscription's time window · renders the cross-platform heat table + sparkline as the first digest item · writes a full report · pushes to configured channels. Subscribe via/subscribe research:<topic> weeklyor the/monitorwizard's new "Trend tracker" option. /ssjwizard integration — 3 new menu items for zero-flag operation:16. 🔍 Research— asks topic → time range (1-5) → citations y/N → runs/researchwith right flags17. 📊 Trend Track— asks topic → tracking window → frequency → creates the/subscribe research:<range>:<topic>subscription18. 📁 Reports— opens/reportsbrowser
- Chinese platform sources (4 of them):
- Bilibili (B站) — zero-config search-all endpoint; returns video + article results with 播放/点赞/弹幕/评论 engagement.
[video · 11:55] 彻底搞懂 Transformer · 54,209 播放 · 2,430 赞 · 78 弹幕. - 知乎 Zhihu — v4 search_v3 API, requires
ZHIHU_COOKIE(browser-extractedd_c0; z_c0); returns answers / articles / questions with 赞/评论/关注 engagement. - 微博 Weibo — m.weibo.cn getIndex endpoint, requires
WEIBO_COOKIE(browser-extractedSUB; SUBP); returns posts with 赞/转/评 engagement. Parses relative Chinese time forms (刚刚,5分钟前,2小时前,今天 HH:MM,MM-DD). - 小红书 Xiaohongshu — edith.xiaohongshu.com notes search, requires
XHS_COOKIE(+ oftenXHS_X_S); returns notes with 赞/评/收藏 engagement. Note: Xiaohongshu anti-bot is aggressive; cookies may expire hourly. Fallback: use--sources tavilywith<query> site:xiaohongshu.com.
- Bilibili (B站) — zero-config search-all endpoint; returns video + article results with 播放/点赞/弹幕/评论 engagement.
- Architecture:
research/package:__init__.py,types.py,time_range.py,http.py,cache.py(24h SQLite at~/.cheetahclaws/research_cache.db, keyed on source + query + limit + time range),classifier.py(keyword-based topic→domain routing, zero latency, zero LLM),ranker.py,aggregator.py,synthesizer.py,citations.py,entities.py,reports.py,sources/(20 modules).tools/research.py: exposesResearchtool to agent (13 parameters: topic, domains, sources, limit, time_range, since, until, analyze_citations, citation_threshold, expand, save_as, auto_save, synthesize, use_cache).commands/research_cmd.py:/research(with compare subcommand) and/reports.monitor/fetchers.py:fetch_research()bridges/monitorsubscriptions to the research pipeline.commands/advanced.py: SSJ menu entries 16/17/18 delegate to the right/research//subscribe//reportscommand line.
- Tests (
tests/test_research.py) — 88 tests across 23 sections covering: types, classifier routing, engagement ranker, cross-source dedup, SQLite cache roundtrip + TTL expiry, each of the 20 sources (happy path + schema-shift resilience + missing-key skip behavior), aggregator parallel fan-out + failure isolation + cache integration, synthesizer LLM path + deterministic no-LLM fallback, heat table + sparkline + trend rendering, citations helper, time-range preset + ISO parsing + per-source native mapping, reports save/load/delete/path, Chinese platform parsing (including Zhihu answer/article/question shapes, Weibo relative-date parser, Xiaohongshu localized count parsing), monitorresearch:prefix dispatch + range-prefix form, entity extraction across all four categories + dedup-within-result guarantee, multi-query expansion producing distinct cache keys, compare mode running 2-3 parallel queries + correct prefixed citation markers. - Packaging —
pyproject.tomladdsresearchandresearch.sourcesto the editable packages list so installed binaries can import the new module. - Version bumped to 3.05.76.
Assets 2
What Changed
- Apr 18, 2026 (v3.05.75): External plugin discovery via
CHEETAHCLAWS_PLUGIN_PATH+ safer dependency management; end-to-end prompt-cache token tracking across providers-
PluginScope.EXTERNAL— new scope for plugins discovered in-place (never copied to~/.cheetahclaws/plugins/). Complements existingUSERandPROJECTscopes. Use case: shared team/company plugin directories mounted at a common path. -
CHEETAHCLAWS_PLUGIN_PATHenv var — colon-separated (os.pathsep) list of directories scanned for plugin subdirs. Each immediate subdirectory that has aplugin.jsonorPLUGIN.mdis surfaced as an external plugin. No new manifest format — reuses the existingPluginManifest.from_plugin_dir()loader. Missing or empty path segments are ignored; hidden directories (.git,.DS_Store, etc.) are skipped. -
Default disabled — external plugins land in
/pluginlist as[external] disabled. User must run/plugin enable <name>once to activate. Enable state persists to~/.cheetahclaws/plugins.jsonunder a newexternal_enabled: {name: bool}map, so it survives restarts without the plugin being installed. -
No silent pip install — unlike the original proposal in #49, cheetahclaws never installs plugin dependencies from an import-failure fallback. Dependency installation happens only at explicit user-consent points:
/plugin install(existing flow), or the first/plugin enableof an external plugin that declaresdependencies. The model cannot trick the runtime into mutating the Python environment. -
Dependency check uses
importlib.metadata.distribution()— new_missing_dependencies(deps)helper keys off the PyPI distribution name, notfind_spec(name). This fixes the PyPI-vs-import-name trap that breaks common packages:Pillow(imports asPIL),PyYAML(imports asyaml),opencv-python(cv2),scikit-learn(sklearn),beautifulsoup4(bs4). The oldfind_spec("pillow")approach returnedNonefor installed Pillow and would loop-install forever. -
Safety guards —
uninstall_pluginon anEXTERNALentry only drops the enable-state record; it nevershutil.rmtrees the user's source directory.update_pluginrefuses external plugins with "update the source directory directly" instead of attemptinggit pull. Malformedplugin.jsonfiles are logged to stderr and skipped, so one bad manifest can't crash/pluginlist. -
Dedupe on name collision — if a plugin name exists in both installed (
USER/PROJECT) and external scopes, the installed entry wins. Within external scopes, the earliest directory inCHEETAHCLAWS_PLUGIN_PATHwins (consistent with$PATHsemantics). -
Tests (
tests/test_plugin_external.py) — 16 tests covering: env var parsing with empty/nonexistent segments,plugin.jsonandPLUGIN.mddiscovery, hidden-directory skip, malformed-JSON resilience, path-order priority, installed-shadows-external dedupe, enable/disable persistence round-trip, PEP 508 requirement parsing (package[extra]>=1.0→package), and a regression test for the PyPI-vs-import-name bug. -
New public export —
from plugin import PLUGIN_PATH_ENVgives the env var name for use in tooling/docs. -
Not changed: existing
USER/PROJECTinstall flow,plugin.json/PLUGIN.mdmanifest format,/plugincommand subcommands. Fully backward compatible — unsetCHEETAHCLAWS_PLUGIN_PATHand the system behaves exactly as before. -
Fix (tool-history integrity for OpenAI-compatible providers) — resolves #57: after long sessions, DeepSeek (and other OpenAI-compatible endpoints) started rejecting requests with
"Messages with role 'tool' must be a response to a preceding message with 'tool_calls'"(HTTP 400), only recoverable by rebooting which lost all context. Root cause:compaction.find_split_point()chose a split index by token count alone, so a split could land between anassistant(tool_calls)message and itstoolresponse messages, leaving orphanedtoolentries in the kept half. Three-layer defense:compaction._respect_tool_pairs(messages, split)— post-processes the split index: if the last message in the old half is anassistantwithtool_calls, advances the split forward past all consecutivetoolresponses; also skips any standalonetoolmessage the split would land on. Falls back to returning0(skip compaction this turn) if no safe split exists — the threshold will re-trigger next turn.compaction.sanitize_history(messages)— single-pass O(n) invariant enforcer. Tracks pendingtool_call_ids from the most recentassistant(tool_calls)in a rolling set; drops anytoolmessage whosetool_call_idis not in the set (orphan), and strips unansweredtool_callsentries from assistant messages when a non-tool message intervenes. If alltool_callson an assistant are stripped, thetool_callskey is removed entirely andcontentis normalized to a non-null string (required by the OpenAI schema). Does not mutate input.agent.run()— callssanitize_historyafter everymaybe_compactand before eachstream()call. Any divergence (from compaction, crashed tool execution, checkpoint restore, or future code paths) is caught before it reaches the provider; emits ahistory_sanitizedwarn-log with the number of messages removed so regressions are visible.- Why three layers instead of one: the split-point fix prevents the primary source of orphans; the sanitizer is a defense-in-depth net that keeps the invariant regardless of where history corruption originates; the agent-loop wiring ensures the net is actually applied. No user-visible behavior change on well-formed histories —
test_well_formed_history_unchangedpins this. - Tests (
tests/test_compaction.py) — 15 new tests across three classes (TestFindSplitPoint.test_split_never_splits_tool_pair,TestRespectToolPairs×ばつ 4,TestSanitizeHistory×ばつ 7) covering split-boundary edge cases (split at every ratio from 0.2 to 0.5, multi-tool-call blocks, standalone orphan tool at split), sanitizer correctness (well-formed history unchanged, orphan drop, partial and full unanswered-tool_calls stripping, unanswered at end of list, wrongtool_call_iddrop), and an input-immutability guarantee.
-
End-to-end prompt-cache token tracking (closes #43) — cache hit/miss counters now flow from provider →
AgentState→ checkpoint snapshots across every supported provider family. Two new default-0 fieldscache_read_tokens/cache_write_tokensonAssistantTurn;AgentState.total_cache_read_tokens/total_cache_write_tokensaccumulate viagetattr(..., 0)so providers that never set the fields still work. Extraction centralized into two helpers inproviders.py:_anthropic_cache_tokens(usage)readscache_read_input_tokens+cache_creation_input_tokens;_openai_cached_read_tokens(usage)walksprompt_tokens_details.cached_tokens. Both coerce missing /Noneto0— older SDKs, non-cached calls, Bedrock-over-litellm wrappers all fall through instead of raisingAttributeError. Provider coverage:Family Cache read Cache write Mechanism Anthropic ( stream_anthropic)✓ ✓ Both fields on final.usagewhen prompt-caching beta is activeOpenAI-schema ( stream_openai_compat— OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, Groq, xAI, any compatible endpoint)✓ 0 (by design) OpenAI's schema has no separate "cache creation" counter; caching is implicit on their side Ollama ( stream_ollama)0 0 No prompt-caching in Ollama today Any future / custom provider 0 (default) 0 (default) getattr(event, "cache_read_tokens", 0)no-op fallbackPersistence:
checkpoint/store.make_snapshotwritestoken_snapshot["cache_read"]/["cache_write"];/checkpoint <id>(and/rewind) restores them alongside input/output totals so counters stay in lock-step with whatever snapshot the user rewound to. Structured logging:api_call_donerecords now includecache_read_tokens/cache_write_tokensalongsidein_tokens/out_tokens. Note: not yet surfaced in/costor/statusoutput — the tracking layer landed first, a follow-up will expose it in the user-facing commands. -
Tests (
tests/test_cache_tokens.py) — 14 tests across 5 layers:AssistantTurnfield defaults + explicit values;AgentStateaccumulation across increments; realmake_snapshotontmp_pathwith all four token fields; Anthropic + OpenAI extraction helpers against synthetic usage objects (populated / missing / None); end-to-endagent.runwith a scripted stream — single-turn propagation and multi-turn accumulation; plus atest_rewind_restores_cache_tokens_from_snapshotregression test that asserts the round-trip.tests/e2e_checkpoint.pyupdated to keep the scripted rewind path in sync with production code. -
Version bumped to 3.05.75.
-
Assets 2
What Changed
cf696e2 -
Apr 16, 2026 (v3.05.74): Web UI production hardening — persistence, multi-user auth, ops endpoints, JS module split, pytest suite
- SQLite persistence (
web/db.py,web/models.py) — SQLAlchemy-backed store with 4 tables:users,chat_sessions,messages,api_credentials. Sessions + message history now survive server restarts (previously in-memory only, lost on restart). DB file at~/.cheetahclaws/web.db(0600). Config keyCHEETAHCLAWS_WEB_DBoverrides the path. - Multi-user auth (
web/auth.py) — replaced single generated password with full accounts: bcrypt password hashing (passlib) + stateless JWT cookies (PyJWT, HS256, 7-day TTL). JWT signing secret persisted to~/.cheetahclaws/web_secret(0600) so logins survive restarts. New endpoints:POST /api/auth/register(first user becomes admin),POST /api/auth/login,POST /api/auth/logout,GET /api/auth/whoami,GET /api/auth/bootstrap(first-run routing). LegacyPOST /api/authkept for the terminal password page. - Session CRUD — new
PATCH /api/sessions/{id}to rename,DELETE /api/sessions/{id}to remove,GET /api/sessions/{id}/exportto download conversation as Markdown. Auto-titling from first user message. Cross-user isolation enforced even on in-memory cache hits (one session hit patched after smoke test revealed the leak). - Structured JSON logging (
web/logging_setup.py) —logging+ custom JSON formatter emits one record per line to stderr, e.g.{"ts":..., "level":"info", "logger":"web.server", "msg":"req", "method":"POST", "path":"/api/auth/login", "status":200, "dur_ms":259, "user_id":1}. Every HTTP response auto-logs method/path/status/dur_ms/user_id/peer. Level controlled byCHEETAHCLAWS_LOG_LEVELenv (default INFO). - Ops endpoints —
GET /healthreturns{ok, db, uptime_s}(503 if DB unreachable);GET /metricsreturns Prometheus v0.0.4 text withcheetahclaws_{uptime_seconds, requests_total, requests_4xx, requests_5xx, auth_logins_total, auth_logins_failed, auth_registrations_total, users_total, ws_connections_total}. Unauthenticated so Prometheus/k8s probes can hit them. - JS module split (
web/static/js/) — monolithic 1813-linechat.html→ 552 lines of HTML + 9 vanilla JS modules (chat.jscore class,util.js,auth.js,sidebar.js,tools.js,approval.js,settings.js,welcome.js,init.js) loaded via plain<script src>tags. Prototype-mixin pattern (Object.assign(ChatApp.prototype, {...})) keepsapp.foo()call sites unchanged. No bundler, no build step. - ETag + conditional caching — JS/CSS/HTML served with
Cache-Control: no-cache, must-revalidate+ weak ETag (mtime-size). Browser gets 304 when unchanged, fresh content after any edit. Binary assets keep 24h cache. Path traversal blocked by resolved-pathis_relative_tocheck. - pytest suite (
tests/test_web_api.py) — 21 end-to-end HTTP tests using httpx: bootstrap/register/login/whoami/logout, sessions CRUD + export + markdown, cross-user isolation, persistence after cache clear,/health,/metricscounter deltas, CORS preflight, auth gating of every endpoint. Spins the real server in a thread on a random port, DB truncated between tests. Runs in ~5s.pytest tests/test_web_api.py. - Sidebar UX — chat sessions now show title + relative time ("just now", "12m ago", "3d ago") + message count + busy dot. Search box filters by title/id on the client. Right-click (or long-press) gives a context menu: Rename / Export Markdown / Delete. Footer shows current username + Sign out link.
- Register-or-login on first visit — chat UI now calls
/api/auth/bootstrapon load; if no user exists it shows a "Create your first account" form (first registration becomes admin), otherwise the "Sign in" form. Username + password instead of a single server-generated password. - Theme: light default + system auto —
:rootnow carries the light palette;@media (prefers-color-scheme: dark)swaps in the dark palette when the user hasn't explicitly chosen a theme. Toggle button cycles system → light → dark → system, icon reflects the effective theme, title tooltip spells out the current mode. Inline pre-paint script in<head>setsdata-themebefore first paint to avoid FOUC. - Auto port selection —
cheetahclaws --web(no--port) now tries 8080 first; onEADDRINUSEit binds:0and lets the kernel pick a free port, banner reports the real URL. Explicit--port Nbinds exactly N or fails loudly (user intent preserved).--portargparse default changed from8080→Noneas a sentinel. - Favicon + MIME polish —
web/static/favicon.{png,ico}cropped fromdocs/logo-5.png(leaping cheetah, transparent background, multi-size ICO 16/32/48). Served from root as/favicon.icofor browser defaults. MIME table extended with.ico(image/vnd.microsoft.icon),.svg,.jpg,.woff,.woff2. - Welcome dashboard rebalanced — old 5-card "Bridges & Media" row (ragged in ×ばつ2 grid) split into two 4-card sections: Bridges (Telegram · WeChat · Slack · Monitor) and Multi-Modal Media (Voice Input · Vision · Copy Output · Export).
/cwdadded to Development Tools. Tagline changed to "Personal AI Assistant · Support Any Model · Autonomous 24/7". - Bridges commands in Chat UI —
/telegram,/wechat(+/weixinalias),/slack,/voicenow registered inweb/api.py's slash registry (previously only the terminal REPL had them), so clicking the dashboard cards actually runs the command. - New extras —
pip install 'cheetahclaws[web]'installssqlalchemy>=2.0,passlib[bcrypt]>=1.7.4,PyJWT>=2.8.0. CLI-only installs remain dependency-free.[all]extra updated. Add web ui demos. - Version bumped to 3.05.74.
- SQLite persistence (
-
Apr 16, 2026 (v3.05.73): Web UI — browser-based Chat UI + structured event API
- Web Chat UI (
web/chat.html) —cheetahclaws --webnow serves a rich browser-based chat interface at/chatalongside the existing PTY terminal at/. Features: real-time streaming via Server-Sent Events (SSE), collapsible tool cards with status badges, inline permission approval buttons (Allow/Deny), activity indicator (spinner + state labels for Thinking/Running/Processing), Markdown rendering with XSS sanitization (marked.jsbundled), dark/light theme toggle withlocalStoragepersistence, mobile-responsive layout with sidebar overlay. - Structured event API (
web/api.py) — newChatSessionclass bridgesagent.run()generator to WebSocket/SSE event streams following the same pattern as the Telegram/Slack/WeChat bridges. Events:text_chunk,thinking_chunk,tool_start,tool_end,permission_request,permission_response,turn_done,command_result,interactive_menu,input_request,status,error. Event buffer with replay for late-joining subscribers. - 8 new API endpoints —
POST /api/prompt(submit prompt or slash command),WS /api/events(real-time event stream),POST /api/approve(permission response),GET /api/sessions(list sessions),GET /api/sessions/{id}(session details + message history),GET/PATCH /api/config(read/write config),GET /api/models(list all 11 providers and models),POST /api/auth(login, sets HttpOnly cookie). - Settings panel — click ⚙ to open: model selector grouped by 11 providers (Anthropic, OpenAI, Gemini, Ollama, DeepSeek, Qwen, etc.), permission mode dropdown, thinking/verbose toggles, max tokens input, per-provider API key management with status indicators, quick action buttons (Compact/Status/Cost/Context), terminal link for fallback.
- Slash command support in Chat UI — all 45+ commands work. Quick commands (
/status,/help,/model,/context) return results instantly via POST response. Long-running commands (/brainstorm,/worker,/plan,/agent) stream events in real-time via SSE (server keeps HTTP connection open)./ssjrenders a clickable 12-item interactive menu./brainstorm(no args) shows a topic input box before starting. - SSJ sub-commands —
/ssj debate,/ssj commit,/ssj readme,/ssj scan,/ssj propose,/ssj reviewnow run directly as agent queries without showing the interactive menu. The menu only appears for/ssj(no args). - Feature dashboard — welcome page shows 24 feature cards organized in 6 categories (Core, Agent Features, Session & Memory, Multi-Model, Development Tools, Bridges & Media) with 7 clickable quick-command chips.
- Security hardening —
hmac.compare_digest()for timing-safe token comparison, XSS sanitization (HTML tags escaped before Markdown rendering), CORS restricted to request Origin echo (no wildcard), HttpOnly + SameSite=Strict cookies, auth checked before WebSocket upgrade,_BufferedSocketwrapper replaces fragilesock.recvmonkey-patching. - Session management — chat sessions with idle timeout (30 min), background reaper for orphaned sessions, session list in sidebar with message count and busy indicator, click to switch, "+" to create new.
- Web bridge integration —
RuntimeContextextended withweb_input_event,web_input_value,in_web_turnfields.tools/interaction.pyroutes permission prompts to web bridge viathreading.Eventsynchronization.commands/advanced.pydetects web turns and skips interactive prompts (uses defaults like Telegram bridge). - Thread-safe stdout streaming —
_ThreadLocalStdoutinterceptsprint()only from the target command thread, broadcasts astext_chunkevents. Other threads unaffected. pyproject.tomlpackaging —webpackage added topackageslist,*.js,*.css,*.htmladded topackage-data. Static assets (xterm.min.js,marked.min.js,chat.html) correctly included inpip installdistributions.- Docs — new Web UI Guide (304 lines): quick start, full feature list, s...
- Web Chat UI (