Run your own evals before you flip production traffic. A higher public score does not guarantee a win on your task.
If you are weighing a frontier model against a local one for cost reasons, that tradeoff has not changed much. See running local LLMs in production.
What this means if you ship agents
The trend from 4.7 to 4.8 is clear. Frontier models keep getting better at coding and more honest about their limits. Neither of those removes your job. You still have to wire budgets, rate limits, and failure handling around the model.
A more honest model flags more problems. It does not stop spend. It will not cap your token usage when a dynamic workflow goes sideways at 2 a.m. You still need something watching the money in production. That is the gap AgentGuard fills.
FAQ
Does Opus 4.8 cost more than 4.7?
No. Input is 5ドル per million tokens and output is 25ドル per million, the same as 4.7.
What is the model id?
claude-opus-4-8.
Should I switch from 4.7?
Most likely yes. The price is identical and the scores are higher. Pin the id and test on your own workload first.
What are dynamic workflows?
A research-preview feature in Claude Code that runs many parallel subagents with resumable state for multi-day jobs. It uses far more tokens than a normal session, so watch your spend.
Building agents on Opus 4.8? Put a budget and a rate limit around them before they surprise you with a bill. AgentGuard is a free, open-source runtime guardrail that caps token spend and request rate for AI agents. Get AgentGuard.
Originally published on bmdpat.com. I run a one-person AI agent company and write about what actually works.
Want these in your inbox? Subscribe to the newsletter - no spam, unsubscribe anytime.