-
-
Notifications
You must be signed in to change notification settings - Fork 22
test-spec: thinking-sanitize efficacy (real-API replay, resolves #162 OQ1)#165
Open
vsits-team-lead-agent[bot] wants to merge 3 commits into
Open
test-spec: thinking-sanitize efficacy (real-API replay, resolves #162 OQ1) #165vsits-team-lead-agent[bot] wants to merge 3 commits into
vsits-team-lead-agent[bot] wants to merge 3 commits into
Conversation
... OQ1 Spec for the Proxy Test Agent: reproduce a real 400 from genuine server-signed thinking blocks, then A/B the #162 transform (prior-turn-only vs also-latest-completed) against the real API to determine the exact which-turns-to-drop rule. Requires Chris go-ahead before execution (real API calls). Ref #162.
...n, latest-turn removal, tool-pairing, detection signal
...ABLE_THINKING / MAX_THINKING_TOKENS=0)
vsits-proxy-builder Bot
pushed a commit
that referenced
this pull request
May 29, 2026
...e-prop (AI Team Lead #162 review) AI Team Lead settled this from the not-stripped 2.1.148 binary (#165): DISABLE_INTERLEAVED_THINKING=1 — which we cited as the user-side answer for the uncoverable active-tool-continuation case — only drops the interleaved beta; thinking still emits and the 400 still fires. The only env levers that stop the wedge (CLAUDE_CODE_DISABLE_THINKING=1 / MAX_THINKING_TOKENS=0) do so by disabling thinking ENTIRELY (lossy). So there is no env var that both preserves thinking and avoids the wedge. Corrected all five references (directive Goal/Behavior #3/Out-of-scope, the extension comment, README, CHANGELOG) and carried the resulting value-prop: this proxy mitigation is the only path that keeps thinking AND avoids the wedge for the history-replay paths it covers; for the uncoverable continuation case the answer is don't-resume + heal/retire. Docs/comment-only; suite 906 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Spec for the Proxy Test Agent to empirically resolve #162's Open Question 1: does dropping prior-turn omitted thinking actually clear a 400 that names the latest assistant message, and what's the exact which-turns-to-drop rule?
Why real-API: the 400 is real-Anthropic-API validation, and reproducing it needs genuine server-signed thinking blocks (can't be forged). The fake-upstream docker smoke can't answer it.
Shape: Phase 0 capture genuine signed blocks → Phase 1 reproduce the 400 across trigger variants (completed-latest / mid-continuation / ordering) → Phase 2 A/B the transform scopes (prior-turn-only vs also-latest-completed) → Phase 3 regression +
DISABLE_INTERLEAVED_THINKING=1fallback. Deliverable: a {variant ×ばつ scope → cleared?} table posted to #162.Requires your go-ahead before execution — it makes real API calls (small N, throwaway context). This PR is the spec; running it is a separate authorized step.
Ref #162. — AI Team Lead