test-spec: thinking-sanitize efficacy (real-API replay, resolves #162 OQ1)#165

Open

vsits-team-lead-agent[bot] wants to merge 3 commits into

main from

test-spec/thinking-sanitize-efficacy

Open

test-spec: thinking-sanitize efficacy (real-API replay, resolves #162 OQ1) #165
vsits-team-lead-agent[bot] wants to merge 3 commits into
main from
test-spec/thinking-sanitize-efficacy

Conversation

@vsits-team-lead-agent

@vsits-team-lead-agent vsits-team-lead-agent Bot commented May 28, 2026

Copy link

ばつ scope → cleared?} table posted to #162. **Requires your go-ahead before execution** — it makes real API calls (small N, throwaway context). This PR is the spec; running it is a separate authorized step. Ref #162. — AI Team Lead" data-view-component="true"> Copy Markdown

Contributor

Spec for the Proxy Test Agent to empirically resolve #162's Open Question 1: does dropping prior-turn omitted thinking actually clear a 400 that names the latest assistant message, and what's the exact which-turns-to-drop rule?

Why real-API: the 400 is real-Anthropic-API validation, and reproducing it needs genuine server-signed thinking blocks (can't be forged). The fake-upstream docker smoke can't answer it.

Shape: Phase 0 capture genuine signed blocks → Phase 1 reproduce the 400 across trigger variants (completed-latest / mid-continuation / ordering) → Phase 2 A/B the transform scopes (prior-turn-only vs also-latest-completed) → Phase 3 regression + DISABLE_INTERLEAVED_THINKING=1 fallback. Deliverable: a {variant ×ばつ scope → cleared?} table posted to #162.

Requires your go-ahead before execution — it makes real API calls (small N, throwaway context). This PR is the spec; running it is a separate authorized step.

Ref #162. — AI Team Lead

@vsits-team-lead-agent


 test-spec: thinking-sanitize efficacy (real-API replay) — resolves  #162 ...

312785c

... OQ1
Spec for the Proxy Test Agent: reproduce a real 400 from genuine server-signed
thinking blocks, then A/B the #162 transform (prior-turn-only vs also-latest-completed)
against the real API to determine the exact which-turns-to-drop rule. Requires Chris
go-ahead before execution (real API calls). Ref #162.

@vsits-team-lead-agent vsits-team-lead-agent Bot mentioned this pull request

May 29, 2026

directive: prior-turn thinking-block sanitize (mitigate #63147) #162

Merged

vsits-team-lead-agent Bot added 2 commits

May 29, 2026 10:41

@vsits-team-lead-agent


 test-spec: fold in #63147 overnight evidence — env-lever determinatio...

98b8a57

...n, latest-turn removal, tool-pairing, detection signal

@vsits-team-lead-agent


 test-spec: env-lever now statically CONFIRMED via 2.1.148 binary (DIS...

931b852

...ABLE_THINKING / MAX_THINKING_TOKENS=0)

vsits-proxy-builder Bot pushed a commit that referenced this pull request

May 29, 2026

@vsits-team-lead-agent @claude


 docs(thinking-sanitize): correct user-side env lever + carry the valu...

6051f85

...e-prop (AI Team Lead #162 review)
AI Team Lead settled this from the not-stripped 2.1.148 binary (#165):
DISABLE_INTERLEAVED_THINKING=1 — which we cited as the user-side answer for the
uncoverable active-tool-continuation case — only drops the interleaved beta;
thinking still emits and the 400 still fires. The only env levers that stop the
wedge (CLAUDE_CODE_DISABLE_THINKING=1 / MAX_THINKING_TOKENS=0) do so by disabling
thinking ENTIRELY (lossy). So there is no env var that both preserves thinking and
avoids the wedge.
Corrected all five references (directive Goal/Behavior #3/Out-of-scope, the
extension comment, README, CHANGELOG) and carried the resulting value-prop: this
proxy mitigation is the only path that keeps thinking AND avoids the wedge for the
history-replay paths it covers; for the uncoverable continuation case the answer
is don't-resume + heal/retire. Docs/comment-only; suite 906 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>