Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Catch silent Claude-coder failures: per-model preflight + bundled-CLI doctor check (0.1.2)#17

Merged
ofer-mendelevitch-thenvoi merged 2 commits into
main from
fix/claude-zombie-turn-detection
Jun 17, 2026
Merged

Catch silent Claude-coder failures: per-model preflight + bundled-CLI doctor check (0.1.2) #17
ofer-mendelevitch-thenvoi merged 2 commits into
main from
fix/claude-zombie-turn-detection

Conversation

@ofer-mendelevitch-thenvoi

@ofer-mendelevitch-thenvoi ofer-mendelevitch-thenvoi commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Problem

A Claude coder could connect to its Band room, receive its dispatch, and then silently do nothing — no crash, no chat message, nothing the Watchdog could catch — so an assigned subtask sat pending forever.

Root cause (confirmed by live repro)

Reproduced with cb run --agent coder-claude_sdk-0 --debug + a human @mention; the debug log showed:

band.adapters.claude_sdk: Sending query to Claude SDK (first_msg=True, parts=3)
band.adapters.claude_sdk: Text: API Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"\"thinking.type.en...
band.adapters.claude_sdk: Complete - 348ms, 0ドル.0000
band.adapters.claude_sdk: Message ... processed successfully
  • The coder ran an Opus model through the stale Claude Code CLI bundled inside an old claude-agent-sdk (bundled 2.1.81 vs system 2.1.179).
  • That CLI emits the legacy thinking.type.enabled request shape, which current models reject with a 400 (thinking.type.enabled is not supported for this model. Use thinking.type.adaptive).
  • band's claude_sdk adapter swallows the error (logs at DEBUG, 0ドル.0000, marks the message processed) and emits nothing → a healthy-looking zombie.
  • It was invisible because preflight only probed a default Sonnet model, never the coder's Opus; Sonnet still accepts the legacy shape, so the auth probe passed.

Changes

  • preflight (preflight.py): probe each distinct configured Claude model (not a hardcoded default), threading model through check_claude_auth. New classified patterns for the invalid_request_error / is not supported for this model 400 with remediation pointing at pip install -U claude-agent-sdk. A model that rejects the request shape now fails cb run fast instead of spawning a silent do-nothing agent.
  • doctor (doctor.py): new "Bundled Claude CLI" check — warns when the CLI bundled in claude_agent_sdk is older than the system claude (agents use the bundled one), with the same remediation.
  • version: bump to 0.1.2.

Deliberately out of scope

Per-turn "zombie" detection at runtime is not included: the 400 is swallowed inside band's adapter, which codeband can't intercept without band SDK hooks (player_claude.py only exposes .adapter to Agent.create()). The per-model preflight prevents this specific failure before agents spawn, which is the higher-leverage fix.

Tests

  • preflight: detects the thinking/invalid_request 400; verifies the probe uses the configured model; verifies run_preflight probes each distinct model (Opus and Sonnet).
  • doctor: bundled-vs-system version comparison (WARN when older, OK when current, INFO when absent) + version parsing.
  • Full suite: 580 passed, ruff clean.

🤖 Generated with Claude Code

ofer-mendelevitch-thenvoi and others added 2 commits June 16, 2026 22:29
A Claude coder could connect to its room, receive its dispatch, and then
silently do nothing — no crash, no chat, nothing the watchdog could catch —
so a subtask sat "pending" forever.
Root cause: the coder ran an opus model through the stale Claude Code CLI
bundled inside an old claude-agent-sdk. That CLI emits the legacy
`thinking.type.enabled` request shape, which current models reject with a
400; band's adapter swallows the error (logs at DEBUG, marks the message
processed) and emits nothing. The failure was invisible because preflight
only probed a default Sonnet model, never the coder's opus.
Changes:
- preflight: probe each *distinct* configured Claude model (not a hardcoded
 default), and classify the `invalid_request_error` / "is not supported for
 this model" 400 with a remediation pointing at `pip install -U
 claude-agent-sdk`. A model that rejects the request shape now fails `cb run`
 fast instead of producing a silent do-nothing agent.
- doctor: new "Bundled Claude CLI" check warns when the CLI bundled in
 claude_agent_sdk is older than the system `claude` (the SDK uses the bundled
 one).
- bump version to 0.1.2.
Per-turn "zombie" detection at runtime is intentionally not included: the
error is swallowed inside band's adapter, which codeband can't intercept
without band SDK hooks. The per-model preflight prevents this specific failure
before agents spawn.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Ofer Mendelevitch <ofermend@gmail.com>
_claude_models hand-listed conductor/mergemaster + the 4 pools and duplicated
the spawner's per-role default-model logic, so a newly added role would be
silently skipped by the preflight probe. Discover models by introspecting the
AgentsConfig fields (duck-typing the singleton vs pool shapes the rest of the
code already uses); only the one non-Sonnet default (coders -> Opus) stays in
_POOL_MODEL_DEFAULT.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Ofer Mendelevitch <ofermend@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /