The Context Bloat Crisis
Any developer who has built a serious agent with tool-calling knows the pain. Hook up five MCP servers and you're looking at 50,000+ tokens of tool definitions before the first real user message. Anthropic's solution is a three-pronged attack.
1. Tool Search
Lets you mark tools with defer_loading: true. Only a lightweight search tool (~500 tokens) is loaded upfront. Tools are discovered on-demand.
| Approach |
Tokens Before Work |
Context Preserved |
| Traditional (all tools upfront) |
~77K tokens |
~23K |
| Tool Search Tool |
~8.7K tokens |
~91K (95% preserved) |
2. Programmatic Tool Calling
Instead of Claude requesting tools one at a time, Claude writes Python orchestration code that calls your tools from within a code execution sandbox.
| Metric |
Without PTC |
With PTC |
| Average tokens consumed |
43,588 |
27,297 (-37%) |
| General intelligent agent |
46.5% |
51.2% |
3. Advisor Strategy
Pair a fast, cheap executor model (Sonnet 4.6) with a high-intelligence advisor model (Opus 4.8) — all inside a single API call. The advisor produces a plan (400-700 tokens) when the executor hits a complex decision.
Files API and MCP Connector
The Files API lets you upload files and reference by ID instead of base64-encoding. The MCP Connector allows connecting to remote MCP servers directly from the Messages API.
1M Context Window on Sonnet 4
Anthropic also made the 1M-token context window available in beta for Claude Sonnet 4, bringing codebase-scale context to cost-sensitive developers.
What This Means
-
Context is expensive — defer it, filter it, keep it out until needed (Tool Search, PTC)
-
Intelligence is expensive — cheap executor by default, heavy artillery on hard problems (Advisor Strategy)
-
Integration shouldn't be — connect to any MCP server directly from the API (MCP Connector)
Sources: Claude API Docs, Anthropic Blog