Opus 4.8 kills budget_tokens — here's what else moved

DEV Community

Claude Code access: Requires a paid Anthropic subscription (Pro, Max, Teams, or Enterprise) or API console credits — the free tier does not include it (video: Tech With Tim).

Fast mode enrollment: Research preview, gated separately. Enroll via the Anthropic console before adding speed: "fast" to any request. Sending the parameter without prior enrollment returns an error.
Platform and context window: Claude API, Amazon Bedrock, and Vertex AI each provide a 1 million token context window . Microsoft Foundry caps at 200k tokens — verify your target platform before designing long-context pipelines.
Output limits: Maximum synchronous output is 128k tokens . The Message Batches API raises this to 300k tokens per call with the beta header output-300k-2026年03月24日.
Knowledge cutoff: January 2026 .
SDK version: Confirm anthropic>=0.51 is installed before deploying.

How to Call Each Capability in Opus 4.8

Five call patterns cover everything new or changed in Opus 4.8. All examples use the Python SDK (anthropic>=0.51). Steps 4 and 5 are the most migration-sensitive; read the next section before deploying either.

Basic call — update the model ID

Change model to claude-opus-4-8. All other request structure from 4.7 carries over unchanged.

 import anthropic
 client = anthropic.Anthropic()
 response = client.messages.create(
 model="claude-opus-4-8",
 max_tokens=1024,
 messages=[{"role": "user", "content": "Explain the migration in one sentence."}]
 )
 print(response.content[0].text)

Fast mode — add speed="fast"

Pass speed="fast" as a top-level parameter (not inside the model string). Doubles per-token cost to 10ドル input / 50ドル output per million tokens . Confirm console enrollment before deploying; unenrolled requests return an error.

 response = client.messages.create(
 model="claude-opus-4-8",
 speed="fast",
 max_tokens=1024,
 messages=[{"role": "user", "content": "Generate a 500-word summary."}]
 )

Mid-session system messages

Insert {"role": "system", "content": "..."} into the messages array immediately after any user turn. No beta header required. Earlier turns stay cached — you pay only for the injected delta .

 messages = [
 {"role": "user", "content": "Analyze this codebase."},
 {"role": "assistant", "content": "Starting with the entry points..."},
 # Inject updated instruction mid-session — no beta header needed
 {"role": "system", "content": "Focus only on security-critical paths from here."},
 {"role": "user", "content": "Continue with the auth module."},
 ]
 response = client.messages.create(
 model="claude-opus-4-8",
 max_tokens=2048,
 messages=messages
 )

Adaptive thinking — omit budget_tokens

Pass thinking={"type": "adaptive"} and set effort separately. Accepted values: "low", "high" (default), "max". The model decides per turn whether extended reasoning is warranted (video: Skill Leap AI). Passing budget_tokens returns a 400.

 response = client.messages.create(
 model="claude-opus-4-8",
 thinking={"type": "adaptive"},
 output_config={"effort": "high"}, # or "low", "max"
 max_tokens=4096,
 messages=[{"role": "user", "content": "Audit this function for security issues."}]
 )

Reading stop_details on refusals

Check response.stop_details.type to branch by category rather than parsing free-text content. The following snippet is illustrative — see the Opus 4.8 changelog for all documented type values.

 response = client.messages.create(
 model="claude-opus-4-8",
 max_tokens=1024,
 messages=[{"role": "user", "content": user_input}]
 )
 if response.stop_reason == "refusal":
 refusal_type = response.stop_details.type
 if refusal_type == "harmful_content":
 return {"error": "content_policy", "action": "revise_prompt"}
 elif refusal_type == "privacy":
 return {"error": "privacy_policy", "action": "remove_pii"}
 else:
 return {"error": "refusal", "type": refusal_type}

The following verified migration script summarizes the full diff from Opus 4.6-style requests to Opus 4.8. It was executed and confirmed (exit 0):

import json
old = {
 "model": "claude-opus-4-6",
 "thinking": {"type": "enabled", "budget_tokens": 32000},
}
new = {
 "model": "claude-opus-4-8",
 "thinking": {"type": "adaptive"},
 "output_config": {"effort": "high"},
 "speed": "fast",
}
print(json.dumps({
 "budget_tokens": "removed: use adaptive thinking + effort instead",
 "before": old,
 "after": new,
 "also_moved": {
 "effort_default": "high",
 "sampling": "omit temperature/top_p/top_k non-defaults",
 "prompt_cache_min_tokens": 1024,
 "max_output_tokens": 128000,
 },
}, indent=2))

Output from the above script:

{"budget_tokens":"removed: use adaptive thinking + effort instead","before":{"model":"claude-opus-4-6","thinking":{"type":"enabled","budget_tokens":32000}},"after":{"model":"claude-opus-4-8","thinking":{"type":"adaptive"},"output_config":{"effort":"high"},"speed":"fast"},"also_moved":{"effort_default":"high","sampling":"omit temperature/top_p/top_k non-defaults","prompt_cache_min_tokens":1024,"max_output_tokens":128000}}

400 Errors and Other Failure Points

Opus 4.8 inherits the same sampling constraints as Opus 4.7. These four failure points account for the majority of migration bugs, particularly when upgrading from Opus 3 or Sonnet:

thinking: {"type": "enabled", "budget_tokens": N} → 400. Deprecated since Opus 4.7, still unsupported in 4.8. Replace with thinking={"type": "adaptive"} plus output_config={"effort": "..."}. This is the single most common mistake when switching from extended-thinking-era code .
Any non-default temperature, top_p, or top_k → 400. Omit these parameters entirely. Steer output style through prompting techniques instead — this constraint is unchanged from 4.7.
speed: "fast" without enrollment → API error. This is not a 400 but a distinct enrollment-check error. Confirm console access before shipping any fast-mode code path.
Microsoft Foundry: 200k token context, not 1M. If your pipeline was designed around the 1M window on the Claude API, Bedrock, or Vertex AI, it behaves differently on Foundry . Verify your target platform before committing to a retrieval or long-document architecture.

On the behavioral side: per TechCrunch's reporting on early enterprise testing, Opus 4.8 is approximately four times less likely than Opus 4.7 to allow code flaws to pass unremarked, and it proactively flags input/output uncertainties in analytical pipelines — a pattern confirmed by Bridgewater Associates during pre-launch evaluation . If your evals measure output volume rather than accuracy, recalibrate before moving to production.

Going Further With Opus 4.8

Three capabilities in Opus 4.8 are worth queuing up for exploration, though two remain in research preview as of June 2026 with no confirmed GA date:

Dynamic Workflows (Claude Code, research preview): Decomposes a large task into a plan, then fans it across hundreds of parallel subagents in a single session. Designed for codebase-scale work — migrations across hundreds of thousands of lines, from kickoff to merge, using your existing test suite as the quality bar . Token consumption is substantially higher than single-agent flows; budget accordingly before enabling.
Message Batches API — 300k output tokens per call: Add the beta header output-300k-2026年03月24日 to raise the per-call output ceiling from 128k to 300k tokens . Practical for large-document generation or batch summarization at scale.
Mid-session prompt injection for cost reduction: In multi-turn agentic loops, inject only the changed instruction slice after each turn while leaving earlier cached turns intact. Savings compound with session length — no need to retransmit the full system prompt on every call.

Frequently Asked Questions

What is the model ID for Claude Opus 4.8?

Use claude-opus-4-8. The model is available on the Claude API, Amazon Bedrock, and Vertex AI with a 1 million token context window , and on Microsoft Foundry with a 200k token limit. Standard pricing is 5ドル per million input tokens and 25ドル per million output tokens.

Why does setting `budget_tokens` cause a 400 error on Opus 4.8?

The thinking: {"type": "enabled", "budget_tokens": N} syntax was deprecated starting with Opus 4.7 and is unsupported in 4.8. The correct form is thinking={"type": "adaptive"} with a separate output_config={"effort": "..."} parameter — values are "low", "high", or "max", defaulting to "high". Omit budget_tokens entirely.

What does fast mode cost compared to standard Opus 4.8?

Standard Opus 4.8 costs 5ドル input / 25ドル output per million tokens. Fast mode costs 10ドル input / 50ドル output per million tokens — exactly double — but delivers up to ×ばつ higher output token rate . Fast mode is a research preview and requires separate enrollment in the Anthropic console before any requests will succeed.

Can I set `temperature` or `top_p` with Opus 4.8?

No. Any non-default temperature, top_p, or top_k value returns a 400 error — the same constraint as Opus 4.7. Omit these parameters entirely and control output style through prompting instead.

Do I need to change any code to benefit from the lower prompt cache threshold?

No code changes required. Opus 4.8 drops the minimum cacheable prompt from approximately 2,000 tokens to 1,024 tokens automatically . Prompts between 1,024 and 2,000 tokens that previously missed caching now qualify, cutting repeat-call input costs without any migration work on your end.

Watch / Sources

What to Migrate, What to Monitor

The Opus 4.8 upgrade is one line for most codebases: change the model ID. The substantive migration work is removing budget_tokens if you were on extended thinking, and confirming no sampling parameters are set. The four new capabilities — fast mode, mid-session system prompts, the lower cache floor, and stop_details routing — each add a handful of lines, not a structural rearchitecture.

Fast mode and Dynamic Workflows remain research previews with no confirmed GA date. The Mythos-class models expected to outperform Opus 4.8 on coding benchmarks were described as coming "in the coming weeks" as of the release date, with safety reviews still ongoing . For now, claude-opus-4-8 is the production ceiling on every supported platform.

Last updated: 2026年06月01日. Based on Anthropic's Opus 4.8 release notes and model documentation published May 28, 2026.