Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

scott-tan-ai/CC-TPR

Repository files navigation

CC-TPR v2 — Claude Code Token Plan Router

Make Claude Code Work 10x More, Nonstop, Same Intelligence.

This router lets Claude Code use MiniMax M3 (frontier coding, 1M context, native multimodality) and Z.AI GLM-5.1 (ties/beats Opus 4.6 on hardest coding benchmarks) -- while automatically routing to DeepSeek V4 Pro (1M context) when your context approaches the threshold.


What's new in v2.0.4

v2.0.4 is a ground-up restructure of v1.5.5 with cumulative additions:

v2.0.1 (major restructure):

  1. Model-centric config -- each Claude model (haiku/sonnet/opus) has its own complete routing block.
  2. Provider registry -- adding a new provider is 1 file + 1 config block.
  3. OpenAI format support -- Cerebras and other OpenAI-format providers work natively.
  4. Per-model context windows -- 114 models across all 15 providers in MODEL_CONTEXT_LIMITS.
  5. 8 new providers -- Together AI, OpenAI, Alibaba, Tencent, Infini, Baidu, China Telecom, Microsoft Foundry.

v2.0.2: 6. Anthropic passthrough -- set provider: anthropic to route directly to Claude's real API (OAuth + API key dual auth). 7. Quota backoff -- 3 consecutive failures triggers 10-minute cooldown. 8. README restore -- v1.5.5 base with v2.0.1 additions + M3 update.

v2.0.3: 9. Sapiens AI (Agnes) -- agnes-2.0-flash, agnes-1.5-flash (free tier, 256K context).

v2.0.4: 10. NVIDIA NIM -- 7 models incl. Nemotron 3 Ultra with thinking support (free tier).


Real cost breakdown (no hidden maths)

Plan Monthly cost Notes
Claude Pro (Sonnet + Opus) 20ドル Low usage ceiling
MiniMax Starter (M2.7) 10ドル 1,500 requests / 5h
Z.AI Lite (GLM-5.1) 18ドル ~80 prompts / 5h
Subtotal (replaces Claude Pro) 28ドル Does 10x more work and costs 8ドル more than Claude Pro
DeepSeek V4 Pro (1M context) 0ドル/month + initial 2ドル min. Pay as you use -- typical <2ドル/month
Agnes (Sapiens AI) 0ドル/month Free tier available, 256K context
NVIDIA NIM 0ドル/month Free tier, 7 models incl. Nemotron 3 Ultra

Updated June 2026: MiniMax M3 is now available. The Plus plan at 20ドル/month gives you 1.7B tokens/month of M3 usage (replaces the 10ドル Starter). This brings the monthly total to 38ドル (20ドル MiniMax Plus + 18ドル Z.AI) OR 40ドル if you choose to go with Anthropic (20ドル MiniMax Plus + 20ドル Claude Pro). M3 beats Sonnet 4.6 on SWE-bench and adds 1M context, native multimodality, and MSA sparse attention.

Free-tier alternative: Use Nemotron 3 Ultra as Haiku replacement (128K context, thinking support) → switch to Kimi K2.6 at 50K tokens → switch to DeepSeek V4 Pro at 204.8K tokens. Sonnet uses Agnes 2.0 Flash (256K) → DeepSeek V4 Pro at 204.8K. Opus unchanged (GLM-5.1 → DeepSeek at 165K). Total monthly: 0ドル with free tiers, or 38ドル with paid plans.


Real-time status line (what you see while using Claude Code)

CC-TPR status line showing routed model and context window CC-TPR status line showing routed model and context window

left to right: directory | actual active model | context window | 5hr quota & reset countdown | weekly quota & reset countdown


Why this router exists

Claude Pro charges 20ドル/month for barely enough usage to build anything substantial.

Model SWE-bench Verified SWE-bench Pro API cost (per 1M output)
Claude Sonnet 4.6 55% -- ~15ドル.00
MiniMax M3 -- 59.0% ~1ドル.20 (12.5x cheaper)
MiniMax M2.7 78% -- ~1ドル.20 (12.5x cheaper)
Claude Opus 4.6 -- 57.3% ~25ドル.00
Z.AI GLM-5.1 -- 58.4% (top spot) ~3ドル.10 (8x cheaper)

The router gives you the best of both worlds -- M3 for daily coding, GLM-5.1 for hard planning, plus a 1M-token emergency brake via DeepSeek when context exceeds the threshold.


Referral links -- how you keep this project alive

We don't charge for the router. The only way we afford to maintain it is through referral commissions when you sign up for the required plans.

You pay exactly the same price -- no markup, no fake bonuses. We get a small commission that pays for development. If everyone signs up directly, this project dies. If you find value in CC-TPR, please use the links below.

Plan Direct link (supports us)
MiniMax Token Plan (M3, M2.7) https://platform.minimax.io/subscribe/token-plan?code=VaYpkbSg4M
Z.AI GLM Token Plan (GLM-5.1) https://z.ai/subscribe?ic=ER6MB4WO5C

Already subscribed? You can still help by giving our github repo a star


Why M3 over Sonnet? (MiniMax)

Metric M3 Sonnet 4.6 Real-world impact
SWE-bench Pro 59.0% -- Surpasses GPT-5.5 and Gemini 3.1 Pro
Terminal-Bench 2.1 66.0% -- Reliable autonomous execution
Context window 1M tokens 200K 5x more context via MSA sparse attention
Multimodal Native (image + video) Image only Understands screenshots, diagrams, UI
API cost (per 1M output) ~1ドル.20 ~15ドル.00 12.5x cheaper

MiniMax Token Plan Plus (20ドル/month) gives you ~1.7B tokens/month of M3 usage -- enough for full-time daily coding.

Get MiniMax via our referral link


Why GLM-5.1 over Opus? (Z.AI)

Metric GLM-5.1 Opus 4.6 Winner
SWE-bench Pro 58.4% (top spot) 57.3% GLM
Terminal-Bench 2.0 69.0 65.4 GLM
AIME 2026 (math) 95.3 ~88% GLM
GPQA (science) 86.2 91.3 Opus (rarely used)
Max autonomous steps 1,200+ -- GLM

Conclusion: For 94.6% of coding tasks, GLM-5.1 is indistinguishable from Opus -- and it actually leads on the hardest engineering benchmark.

The GLM Coding Lite (18ドル) gives ~3x the prompts of Claude Pro.

Get Z.AI GLM via our referral link


How the context fallback works (DeepSeek V4 Pro)

  • M3 supports up to 1M tokens (MSA sparse attention). M2.7 and GLM both have a 200K token context window.
  • When your conversation reaches the configured threshold, the router pre-emptively switches to DeepSeek V4 Pro (1M context).
  • DeepSeek is pay-as-you-go -- you make a minimum first-time payment of 2ドル to unlock the model. After that, you add credits as needed (no monthly fee).
  • Most users spend less than 2ドル/month on DeepSeek, because large contexts are rare.

Supported providers

Core providers (recommended)

# Provider Format Plan Notes
1 MiniMax Anthropic 10ドル-120/mo Token Plan M3, M2.7
2 Z.AI Anthropic GLM Coding Plan GLM-5.1
3 DeepSeek Anthropic Pay-per-token V4 Pro (1M context)
4 OpenRouter Anthropic Pay-per-token Slug-transformed failover
5 Anthropic Anthropic Claude Pro/Max Direct passthrough (OAuth + API key)

Also supported

# Provider Format Plan Notes
6 Cerebras OpenAI Pay-per-token gpt-oss-120b
7 Xiaomi MiMo Anthropic Pay-per-token mimo-v2.5-pro
8 Moonshot Kimi Anthropic Pay-per-token kimi-k2.6
9 Together AI OpenAI 25ドル free credits MiniMax, qwen3-coder
10 OpenAI OpenAI Pay-per-token gpt-5.x
11 Alibaba Cloud Anthropic 50ドル/mo Pro qwen3.5-plus
12 Tencent Cloud Anthropic 40円-200/mo Hunyuan
13 Infini Anthropic 40円-200/mo kimi-k2.5, glm-5
14 Baidu Qianfan Anthropic Multiple tiers ERNIE-5.0
15 China Telecom Anthropic 29円-699/mo GLM-5
16 Microsoft Foundry Anthropic Azure Enterprise Real Claude 4.x
17 Sapiens AI (Agnes) OpenAI Free tier agnes-2.0-flash, agnes-1.5-flash (256K)
18 NVIDIA NIM OpenAI Free tier 7 models incl. Nemotron 3 Ultra (thinking)

Quick start

Windows

  1. Clone the repo
  2. Double-click CC-TPR_Win_Start.bat -- a CMD window opens with the router running.
  3. Start Claude Code as usual -- it will automatically route through the proxy.
  4. Close the CMD window or press Ctrl+C when done. Or run stop-router_Win.bat.

macOS / Linux

  1. Clone the repo
  2. Make scripts executable (first time only):
    chmod +x CC-TPR_Mac_Start.sh stop-router_Mac.sh
  3. Run the launcher:
    ./CC-TPR_Mac_Start.sh
  4. Start Claude Code as usual -- it will automatically route through the proxy.
  5. Stop the router: press Ctrl+C or run ./stop-router_Mac.sh from another terminal.

Note: Closing the terminal window on macOS stops the router automatically. No lingering processes.


Manual Status Line (Zed, Terminal, cmd, PowerShell)

CC-TPR status line showing routed model and context window

For editors without statusLine.command support (Zed, VS Code), CC-TPR provides a standalone Python script that displays real-time routing status inline.

Running from Zed

  1. Open Zed to your CC-TPR project

  2. Toggle the terminal: press Ctrl+` (or `Cmd+`` on Mac)

  3. Run the status line:

    Windows:

    python statusline\manual_statusline_start.py

    macOS/Linux:

    python3 statusline/manual_statusline_start.py

Output format

myproject | MiniMax-M3 | [####....] 45% 210k | [##......] 3% 2h32m | [##......] 2% 5d05h
 model context bar 5hr quota weekly quota

Color coding

Usage Color
< 70% Green
70-79% Orange
>= 80% Red
Waiting DIM

Stopping

Press Ctrl+C in the terminal running the script.


Configuration

File Purpose
config.yaml Model routing rules, context thresholds, server threads
.env (copy from .env.example) API keys

Core API keys

MINIMAX_API_KEY=your_key
ZAI_API_KEY=your_key
DEEPSEEK_API_KEY=your_key
OPENROUTER_API_KEY=your_key

Additional providers (optional)

CEREBRAS_API_KEY=
XIAOMI_API_KEY=
MOONSHOT_API_KEY=
TOGETHER_API_KEY=
OPENAI_API_KEY=
ALIBABA_API_KEY=
TENCENT_API_KEY=
INFINI_API_KEY=
BAIDU_API_KEY=
CTYUN_API_KEY=
AZURE_FOUNDRY_API_KEY=
ANTHROPIC_API_KEY=

See docs/ROUTING.md for the full config schema, and docs/ARCHITECTURE.md for how the router works internally.

Thread & Concurrency Settings

In config.yaml, server.threads controls how many concurrent requests the router handles:

server:
 threads: 3 # default: +3 per concurrent Claude Code session
Concurrent Sessions Recommended threads
1 3 (default)
2 6
3+ 9+

If you see WARNING:waitress.queue:Task queue depth is X (X >= 3 under normal load), increase threads in config.yaml to match your workload.

The router uses Waitress (production WSGI server) instead of Flask's dev server -- stable for overnight long-running tasks.


Migrating from v1.5.5

v2.0.1 has a breaking config schema. v1.5.5 stays as a tag for users who need it.

Manual migration steps:

  1. routing.models.<key> --> routing.<key>.provider
  2. providers.<name>.model --> routing.<key>.model (same value across keys using that provider)
  3. Global smart_switch + context_threshold --> per-key smart_switch
  4. Remove failover.final_fallback (if present) -- v2.0.1 chain ends at OpenRouter
  5. Remove providers.<name>.context_limit (if present) -- now lives in MODEL_CONTEXT_LIMITS map

Development

Windows

python -m venv .venv
.venv\Scripts\python.exe -m pip install -e ".[dev]"
pytest
pyright
python -m src.main

macOS / Linux

python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"
pytest
pyright
.venv/bin/python -m src.main

License

MIT -- free for any use, including commercial.


FAQ

Q: Do I really need both plans? A: You could use only MiniMax (replaces Sonnet & Haiku) and skip GLM. But GLM is only 18ドル and gives Opus-class reasoning -- worth it for planning/architecture/review.

Q: What if I never hit the context threshold? A: Then you never pay DeepSeek beyond the initial 2ドル deposit. Your monthly stays at 28ドル (Starter) or 38ドル (M3 Plus).

Q: Why not just use OpenRouter directly? A: OpenRouter doesn't give you token-plan pricing. Our router uses monthly subscription plans (MiniMax, Z.AI) which are ~10x cheaper than pay-as-you-go API.

Q: What's new in v2.0.1? A: Model-centric config, provider registry, OpenAI format support, 114-model context window map, and 8 new providers. See the "What's new" section above.

Q: Do I need all 16 providers? A: No. Most users need 2-3: MiniMax + Z.AI + DeepSeek. The others are optional and only require API keys if you want to use them.

Q: Can I use a custom model? A: Yes. Edit config.yaml to point any routing key at any provider/model combo. You can also set provider: anthropic to pass requests directly to Anthropic's API without remapping.

Q: What if my provider isn't listed? A: If the provider supports the Anthropic Messages API format, you can add it by creating a provider entry in config.yaml. See docs/ARCHITECTURE.md for how providers work.

About

Get 10x more usage from Claude Code while only paying 8ドル more. Supports dual token plans, context window based model routing to maximize use and minimize cost. Highly recommended to combo with "Everything Claude Code"

Topics

Resources

License

Stars

Watchers

Forks

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /