This router lets Claude Code use MiniMax M3 (frontier coding, 1M context, native multimodality) and Z.AI GLM-5.1 (ties/beats Opus 4.6 on hardest coding benchmarks) -- while automatically routing to DeepSeek V4 Pro (1M context) when your context approaches the threshold.
v2.0.4 is a ground-up restructure of v1.5.5 with cumulative additions:
v2.0.1 (major restructure):
- Model-centric config -- each Claude model (haiku/sonnet/opus) has its own complete routing block.
- Provider registry -- adding a new provider is 1 file + 1 config block.
- OpenAI format support -- Cerebras and other OpenAI-format providers work natively.
- Per-model context windows -- 114 models across all 15 providers in
MODEL_CONTEXT_LIMITS. - 8 new providers -- Together AI, OpenAI, Alibaba, Tencent, Infini, Baidu, China Telecom, Microsoft Foundry.
v2.0.2:
6. Anthropic passthrough -- set provider: anthropic to route directly to Claude's real API (OAuth + API key dual auth).
7. Quota backoff -- 3 consecutive failures triggers 10-minute cooldown.
8. README restore -- v1.5.5 base with v2.0.1 additions + M3 update.
v2.0.3: 9. Sapiens AI (Agnes) -- agnes-2.0-flash, agnes-1.5-flash (free tier, 256K context).
v2.0.4: 10. NVIDIA NIM -- 7 models incl. Nemotron 3 Ultra with thinking support (free tier).
Real cost breakdown (no hidden maths)
| Plan | Monthly cost | Notes |
|---|---|---|
| Claude Pro (Sonnet + Opus) | 20ドル | Low usage ceiling |
| MiniMax Starter (M2.7) | 10ドル | 1,500 requests / 5h |
| Z.AI Lite (GLM-5.1) | 18ドル | ~80 prompts / 5h |
| Subtotal (replaces Claude Pro) | 28ドル | Does 10x more work and costs 8ドル more than Claude Pro |
| DeepSeek V4 Pro (1M context) | 0ドル/month + initial 2ドル min. | Pay as you use -- typical <2ドル/month |
| Agnes (Sapiens AI) | 0ドル/month | Free tier available, 256K context |
| NVIDIA NIM | 0ドル/month | Free tier, 7 models incl. Nemotron 3 Ultra |
Updated June 2026: MiniMax M3 is now available. The Plus plan at 20ドル/month gives you 1.7B tokens/month of M3 usage (replaces the 10ドル Starter). This brings the monthly total to 38ドル (20ドル MiniMax Plus + 18ドル Z.AI) OR 40ドル if you choose to go with Anthropic (20ドル MiniMax Plus + 20ドル Claude Pro). M3 beats Sonnet 4.6 on SWE-bench and adds 1M context, native multimodality, and MSA sparse attention.
Free-tier alternative: Use Nemotron 3 Ultra as Haiku replacement (128K context, thinking support) → switch to Kimi K2.6 at 50K tokens → switch to DeepSeek V4 Pro at 204.8K tokens. Sonnet uses Agnes 2.0 Flash (256K) → DeepSeek V4 Pro at 204.8K. Opus unchanged (GLM-5.1 → DeepSeek at 165K). Total monthly: 0ドル with free tiers, or 38ドル with paid plans.
CC-TPR status line showing routed model and context window CC-TPR status line showing routed model and context window
left to right: directory | actual active model | context window | 5hr quota & reset countdown | weekly quota & reset countdown
Claude Pro charges 20ドル/month for barely enough usage to build anything substantial.
| Model | SWE-bench Verified | SWE-bench Pro | API cost (per 1M output) |
|---|---|---|---|
| Claude Sonnet 4.6 | 55% | -- | ~15ドル.00 |
| MiniMax M3 | -- | 59.0% | ~1ドル.20 (12.5x cheaper) |
| MiniMax M2.7 | 78% | -- | ~1ドル.20 (12.5x cheaper) |
| Claude Opus 4.6 | -- | 57.3% | ~25ドル.00 |
| Z.AI GLM-5.1 | -- | 58.4% (top spot) | ~3ドル.10 (8x cheaper) |
The router gives you the best of both worlds -- M3 for daily coding, GLM-5.1 for hard planning, plus a 1M-token emergency brake via DeepSeek when context exceeds the threshold.
We don't charge for the router. The only way we afford to maintain it is through referral commissions when you sign up for the required plans.
You pay exactly the same price -- no markup, no fake bonuses. We get a small commission that pays for development. If everyone signs up directly, this project dies. If you find value in CC-TPR, please use the links below.
| Plan | Direct link (supports us) |
|---|---|
| MiniMax Token Plan (M3, M2.7) | https://platform.minimax.io/subscribe/token-plan?code=VaYpkbSg4M |
| Z.AI GLM Token Plan (GLM-5.1) | https://z.ai/subscribe?ic=ER6MB4WO5C |
Already subscribed? You can still help by giving our github repo a star
| Metric | M3 | Sonnet 4.6 | Real-world impact |
|---|---|---|---|
| SWE-bench Pro | 59.0% | -- | Surpasses GPT-5.5 and Gemini 3.1 Pro |
| Terminal-Bench 2.1 | 66.0% | -- | Reliable autonomous execution |
| Context window | 1M tokens | 200K | 5x more context via MSA sparse attention |
| Multimodal | Native (image + video) | Image only | Understands screenshots, diagrams, UI |
| API cost (per 1M output) | ~1ドル.20 | ~15ドル.00 | 12.5x cheaper |
MiniMax Token Plan Plus (20ドル/month) gives you ~1.7B tokens/month of M3 usage -- enough for full-time daily coding.
Get MiniMax via our referral link
| Metric | GLM-5.1 | Opus 4.6 | Winner |
|---|---|---|---|
| SWE-bench Pro | 58.4% (top spot) | 57.3% | GLM |
| Terminal-Bench 2.0 | 69.0 | 65.4 | GLM |
| AIME 2026 (math) | 95.3 | ~88% | GLM |
| GPQA (science) | 86.2 | 91.3 | Opus (rarely used) |
| Max autonomous steps | 1,200+ | -- | GLM |
Conclusion: For 94.6% of coding tasks, GLM-5.1 is indistinguishable from Opus -- and it actually leads on the hardest engineering benchmark.
The GLM Coding Lite (18ドル) gives ~3x the prompts of Claude Pro.
Get Z.AI GLM via our referral link
- M3 supports up to 1M tokens (MSA sparse attention). M2.7 and GLM both have a 200K token context window.
- When your conversation reaches the configured threshold, the router pre-emptively switches to DeepSeek V4 Pro (1M context).
- DeepSeek is pay-as-you-go -- you make a minimum first-time payment of 2ドル to unlock the model. After that, you add credits as needed (no monthly fee).
- Most users spend less than 2ドル/month on DeepSeek, because large contexts are rare.
| # | Provider | Format | Plan | Notes |
|---|---|---|---|---|
| 1 | MiniMax | Anthropic | 10ドル-120/mo Token Plan | M3, M2.7 |
| 2 | Z.AI | Anthropic | GLM Coding Plan | GLM-5.1 |
| 3 | DeepSeek | Anthropic | Pay-per-token | V4 Pro (1M context) |
| 4 | OpenRouter | Anthropic | Pay-per-token | Slug-transformed failover |
| 5 | Anthropic | Anthropic | Claude Pro/Max | Direct passthrough (OAuth + API key) |
| # | Provider | Format | Plan | Notes |
|---|---|---|---|---|
| 6 | Cerebras | OpenAI | Pay-per-token | gpt-oss-120b |
| 7 | Xiaomi MiMo | Anthropic | Pay-per-token | mimo-v2.5-pro |
| 8 | Moonshot Kimi | Anthropic | Pay-per-token | kimi-k2.6 |
| 9 | Together AI | OpenAI | 25ドル free credits | MiniMax, qwen3-coder |
| 10 | OpenAI | OpenAI | Pay-per-token | gpt-5.x |
| 11 | Alibaba Cloud | Anthropic | 50ドル/mo Pro | qwen3.5-plus |
| 12 | Tencent Cloud | Anthropic | 40円-200/mo | Hunyuan |
| 13 | Infini | Anthropic | 40円-200/mo | kimi-k2.5, glm-5 |
| 14 | Baidu Qianfan | Anthropic | Multiple tiers | ERNIE-5.0 |
| 15 | China Telecom | Anthropic | 29円-699/mo | GLM-5 |
| 16 | Microsoft Foundry | Anthropic | Azure Enterprise | Real Claude 4.x |
| 17 | Sapiens AI (Agnes) | OpenAI | Free tier | agnes-2.0-flash, agnes-1.5-flash (256K) |
| 18 | NVIDIA NIM | OpenAI | Free tier | 7 models incl. Nemotron 3 Ultra (thinking) |
- Clone the repo
- Double-click
CC-TPR_Win_Start.bat-- a CMD window opens with the router running. - Start Claude Code as usual -- it will automatically route through the proxy.
- Close the CMD window or press
Ctrl+Cwhen done. Or runstop-router_Win.bat.
- Clone the repo
- Make scripts executable (first time only):
chmod +x CC-TPR_Mac_Start.sh stop-router_Mac.sh
- Run the launcher:
./CC-TPR_Mac_Start.sh
- Start Claude Code as usual -- it will automatically route through the proxy.
- Stop the router: press
Ctrl+Cor run./stop-router_Mac.shfrom another terminal.
Note: Closing the terminal window on macOS stops the router automatically. No lingering processes.
CC-TPR status line showing routed model and context window
For editors without statusLine.command support (Zed, VS Code), CC-TPR provides a standalone Python script that displays real-time routing status inline.
-
Open Zed to your CC-TPR project
-
Toggle the terminal: press
Ctrl+`(or `Cmd+`` on Mac) -
Run the status line:
Windows:
python statusline\manual_statusline_start.py
macOS/Linux:
python3 statusline/manual_statusline_start.py
myproject | MiniMax-M3 | [####....] 45% 210k | [##......] 3% 2h32m | [##......] 2% 5d05h
model context bar 5hr quota weekly quota
| Usage | Color |
|---|---|
| < 70% | Green |
| 70-79% | Orange |
| >= 80% | Red |
| Waiting | DIM |
Press Ctrl+C in the terminal running the script.
| File | Purpose |
|---|---|
config.yaml |
Model routing rules, context thresholds, server threads |
.env (copy from .env.example) |
API keys |
MINIMAX_API_KEY=your_key ZAI_API_KEY=your_key DEEPSEEK_API_KEY=your_key OPENROUTER_API_KEY=your_key
CEREBRAS_API_KEY= XIAOMI_API_KEY= MOONSHOT_API_KEY= TOGETHER_API_KEY= OPENAI_API_KEY= ALIBABA_API_KEY= TENCENT_API_KEY= INFINI_API_KEY= BAIDU_API_KEY= CTYUN_API_KEY= AZURE_FOUNDRY_API_KEY= ANTHROPIC_API_KEY=
See docs/ROUTING.md for the full config schema, and docs/ARCHITECTURE.md for how the router works internally.
In config.yaml, server.threads controls how many concurrent requests the router handles:
server: threads: 3 # default: +3 per concurrent Claude Code session
| Concurrent Sessions | Recommended threads |
|---|---|
| 1 | 3 (default) |
| 2 | 6 |
| 3+ | 9+ |
If you see WARNING:waitress.queue:Task queue depth is X (X >= 3 under normal load), increase threads in config.yaml to match your workload.
The router uses Waitress (production WSGI server) instead of Flask's dev server -- stable for overnight long-running tasks.
v2.0.1 has a breaking config schema. v1.5.5 stays as a tag for users who need it.
Manual migration steps:
routing.models.<key>-->routing.<key>.providerproviders.<name>.model-->routing.<key>.model(same value across keys using that provider)- Global
smart_switch+context_threshold--> per-keysmart_switch - Remove
failover.final_fallback(if present) -- v2.0.1 chain ends at OpenRouter - Remove
providers.<name>.context_limit(if present) -- now lives in MODEL_CONTEXT_LIMITS map
python -m venv .venv .venv\Scripts\python.exe -m pip install -e ".[dev]" pytest pyright python -m src.main
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"
pytest
pyright
.venv/bin/python -m src.mainMIT -- free for any use, including commercial.
Q: Do I really need both plans? A: You could use only MiniMax (replaces Sonnet & Haiku) and skip GLM. But GLM is only 18ドル and gives Opus-class reasoning -- worth it for planning/architecture/review.
Q: What if I never hit the context threshold? A: Then you never pay DeepSeek beyond the initial 2ドル deposit. Your monthly stays at 28ドル (Starter) or 38ドル (M3 Plus).
Q: Why not just use OpenRouter directly? A: OpenRouter doesn't give you token-plan pricing. Our router uses monthly subscription plans (MiniMax, Z.AI) which are ~10x cheaper than pay-as-you-go API.
Q: What's new in v2.0.1? A: Model-centric config, provider registry, OpenAI format support, 114-model context window map, and 8 new providers. See the "What's new" section above.
Q: Do I need all 16 providers? A: No. Most users need 2-3: MiniMax + Z.AI + DeepSeek. The others are optional and only require API keys if you want to use them.
Q: Can I use a custom model?
A: Yes. Edit config.yaml to point any routing key at any provider/model combo. You can also set provider: anthropic to pass requests directly to Anthropic's API without remapping.
Q: What if my provider isn't listed?
A: If the provider supports the Anthropic Messages API format, you can add it by creating a provider entry in config.yaml. See docs/ARCHITECTURE.md for how providers work.