Name	Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows	.github/workflows
docs	docs
scripts	scripts
src	src
statusline	statusline
tests	tests
.env.example	.env.example
.gitattributes	.gitattributes
.gitignore	.gitignore
CC-TPR_Mac_Start.sh	CC-TPR_Mac_Start.sh
CC-TPR_Win_Start.bat	CC-TPR_Win_Start.bat
CHANGELOG.md	CHANGELOG.md
LICENSE	LICENSE
README.md	README.md
_cleanup.ps1	_cleanup.ps1
check_python.ps1	check_python.ps1
config.yaml	config.yaml
pyproject.toml	pyproject.toml
stop-router_Mac.sh	stop-router_Mac.sh
stop-router_Win.bat	stop-router_Win.bat

CC-TPR v2 — Claude Code Token Plan Router

Make Claude Code Work 10x More, Nonstop, Same Intelligence.

This router lets Claude Code use MiniMax M3 (frontier coding, 1M context, native multimodality) and Z.AI GLM-5.1 (ties/beats Opus 4.6 on hardest coding benchmarks) -- while automatically routing to DeepSeek V4 Pro (1M context) when your context approaches the threshold.

What's new in v2.0.4

v2.0.4 is a ground-up restructure of v1.5.5 with cumulative additions:

v2.0.1 (major restructure):

Model-centric config -- each Claude model (haiku/sonnet/opus) has its own complete routing block.
Provider registry -- adding a new provider is 1 file + 1 config block.
OpenAI format support -- Cerebras and other OpenAI-format providers work natively.
Per-model context windows -- 114 models across all 15 providers in MODEL_CONTEXT_LIMITS.
8 new providers -- Together AI, OpenAI, Alibaba, Tencent, Infini, Baidu, China Telecom, Microsoft Foundry.

v2.0.2: 6. Anthropic passthrough -- set provider: anthropic to route directly to Claude's real API (OAuth + API key dual auth). 7. Quota backoff -- 3 consecutive failures triggers 10-minute cooldown. 8. README restore -- v1.5.5 base with v2.0.1 additions + M3 update.

v2.0.3: 9. Sapiens AI (Agnes) -- agnes-2.0-flash, agnes-1.5-flash (free tier, 256K context).

v2.0.4: 10. NVIDIA NIM -- 7 models incl. Nemotron 3 Ultra with thinking support (free tier).

Real cost breakdown (no hidden maths)

Plan	Monthly cost	Notes
Claude Pro (Sonnet + Opus)	20ドル	Low usage ceiling
MiniMax Starter (M2.7)	10ドル	1,500 requests / 5h
Z.AI Lite (GLM-5.1)	18ドル	~80 prompts / 5h
Subtotal (replaces Claude Pro)	28ドル	Does 10x more work and costs 8ドル more than Claude Pro
DeepSeek V4 Pro (1M context)	0ドル/month + initial 2ドル min.	Pay as you use -- typical <2ドル/month
Agnes (Sapiens AI)	0ドル/month	Free tier available, 256K context
NVIDIA NIM	0ドル/month	Free tier, 7 models incl. Nemotron 3 Ultra

Updated June 2026: MiniMax M3 is now available. The Plus plan at 20ドル/month gives you 1.7B tokens/month of M3 usage (replaces the 10ドル Starter). This brings the monthly total to 38ドル (20ドル MiniMax Plus + 18ドル Z.AI) OR 40ドル if you choose to go with Anthropic (20ドル MiniMax Plus + 20ドル Claude Pro). M3 beats Sonnet 4.6 on SWE-bench and adds 1M context, native multimodality, and MSA sparse attention.

Free-tier alternative: Use Nemotron 3 Ultra as Haiku replacement (128K context, thinking support) → switch to Kimi K2.6 at 50K tokens → switch to DeepSeek V4 Pro at 204.8K tokens. Sonnet uses Agnes 2.0 Flash (256K) → DeepSeek V4 Pro at 204.8K. Opus unchanged (GLM-5.1 → DeepSeek at 165K). Total monthly: 0ドル with free tiers, or 38ドル with paid plans.

Real-time status line (what you see while using Claude Code)

CC-TPR status line showing routed model and context window CC-TPR status line showing routed model and context window

left to right: directory | actual active model | context window | 5hr quota & reset countdown | weekly quota & reset countdown

Why this router exists

Claude Pro charges 20ドル/month for barely enough usage to build anything substantial.

Model	SWE-bench Verified	SWE-bench Pro	API cost (per 1M output)
Claude Sonnet 4.6	55%	--	~15ドル.00
MiniMax M3	--	59.0%	~1ドル.20 (12.5x cheaper)
MiniMax M2.7	78%	--	~1ドル.20 (12.5x cheaper)
Claude Opus 4.6	--	57.3%	~25ドル.00
Z.AI GLM-5.1	--	58.4% (top spot)	~3ドル.10 (8x cheaper)

The router gives you the best of both worlds -- M3 for daily coding, GLM-5.1 for hard planning, plus a 1M-token emergency brake via DeepSeek when context exceeds the threshold.

Referral links -- how you keep this project alive

We don't charge for the router. The only way we afford to maintain it is through referral commissions when you sign up for the required plans.

You pay exactly the same price -- no markup, no fake bonuses. We get a small commission that pays for development. If everyone signs up directly, this project dies. If you find value in CC-TPR, please use the links below.

Plan	Direct link (supports us)
MiniMax Token Plan (M3, M2.7)	https://platform.minimax.io/subscribe/token-plan?code=VaYpkbSg4M
Z.AI GLM Token Plan (GLM-5.1)	https://z.ai/subscribe?ic=ER6MB4WO5C

Already subscribed? You can still help by giving our github repo a star

Why M3 over Sonnet? (MiniMax)

Metric	M3	Sonnet 4.6	Real-world impact
SWE-bench Pro	59.0%	--	Surpasses GPT-5.5 and Gemini 3.1 Pro
Terminal-Bench 2.1	66.0%	--	Reliable autonomous execution
Context window	1M tokens	200K	5x more context via MSA sparse attention
Multimodal	Native (image + video)	Image only	Understands screenshots, diagrams, UI
API cost (per 1M output)	~1ドル.20	~15ドル.00	12.5x cheaper

MiniMax Token Plan Plus (20ドル/month) gives you ~1.7B tokens/month of M3 usage -- enough for full-time daily coding.

Get MiniMax via our referral link

Why GLM-5.1 over Opus? (Z.AI)

Metric	GLM-5.1	Opus 4.6	Winner
SWE-bench Pro	58.4% (top spot)	57.3%	GLM
Terminal-Bench 2.0	69.0	65.4	GLM
AIME 2026 (math)	95.3	~88%	GLM
GPQA (science)	86.2	91.3	Opus (rarely used)
Max autonomous steps	1,200+	--	GLM

Conclusion: For 94.6% of coding tasks, GLM-5.1 is indistinguishable from Opus -- and it actually leads on the hardest engineering benchmark.

The GLM Coding Lite (18ドル) gives ~3x the prompts of Claude Pro.

Get Z.AI GLM via our referral link

How the context fallback works (DeepSeek V4 Pro)

M3 supports up to 1M tokens (MSA sparse attention). M2.7 and GLM both have a 200K token context window.
When your conversation reaches the configured threshold, the router pre-emptively switches to DeepSeek V4 Pro (1M context).
DeepSeek is pay-as-you-go -- you make a minimum first-time payment of 2ドル to unlock the model. After that, you add credits as needed (no monthly fee).
Most users spend less than 2ドル/month on DeepSeek, because large contexts are rare.

Supported providers

Core providers (recommended)

#	Provider	Format	Plan	Notes
1	MiniMax	Anthropic	10ドル-120/mo Token Plan	M3, M2.7
2	Z.AI	Anthropic	GLM Coding Plan	GLM-5.1
3	DeepSeek	Anthropic	Pay-per-token	V4 Pro (1M context)
4	OpenRouter	Anthropic	Pay-per-token	Slug-transformed failover
5	Anthropic	Anthropic	Claude Pro/Max	Direct passthrough (OAuth + API key)

Also supported

#	Provider	Format	Plan	Notes
6	Cerebras	OpenAI	Pay-per-token	gpt-oss-120b
7	Xiaomi MiMo	Anthropic	Pay-per-token	mimo-v2.5-pro
8	Moonshot Kimi	Anthropic	Pay-per-token	kimi-k2.6
9	Together AI	OpenAI	25ドル free credits	MiniMax, qwen3-coder
10	OpenAI	OpenAI	Pay-per-token	gpt-5.x
11	Alibaba Cloud	Anthropic	50ドル/mo Pro	qwen3.5-plus
12	Tencent Cloud	Anthropic	40円-200/mo	Hunyuan
13	Infini	Anthropic	40円-200/mo	kimi-k2.5, glm-5
14	Baidu Qianfan	Anthropic	Multiple tiers	ERNIE-5.0
15	China Telecom	Anthropic	29円-699/mo	GLM-5
16	Microsoft Foundry	Anthropic	Azure Enterprise	Real Claude 4.x
17	Sapiens AI (Agnes)	OpenAI	Free tier	agnes-2.0-flash, agnes-1.5-flash (256K)
18	NVIDIA NIM	OpenAI	Free tier	7 models incl. Nemotron 3 Ultra (thinking)

Quick start

Windows

Clone the repo
Double-click CC-TPR_Win_Start.bat -- a CMD window opens with the router running.
Start Claude Code as usual -- it will automatically route through the proxy.
Close the CMD window or press Ctrl+C when done. Or run stop-router_Win.bat.

macOS / Linux

Clone the repo

Make scripts executable (first time only):

chmod +x CC-TPR_Mac_Start.sh stop-router_Mac.sh

Run the launcher:
```
./CC-TPR_Mac_Start.sh
```
Start Claude Code as usual -- it will automatically route through the proxy.
Stop the router: press Ctrl+C or run ./stop-router_Mac.sh from another terminal.

Note: Closing the terminal window on macOS stops the router automatically. No lingering processes.

Manual Status Line (Zed, Terminal, cmd, PowerShell)

CC-TPR status line showing routed model and context window

For editors without statusLine.command support (Zed, VS Code), CC-TPR provides a standalone Python script that displays real-time routing status inline.

Running from Zed

Open Zed to your CC-TPR project
Toggle the terminal: press Ctrl+` (or `Cmd+`` on Mac)

Run the status line:

Windows:

python statusline\manual_statusline_start.py

macOS/Linux:

python3 statusline/manual_statusline_start.py

Output format

myproject | MiniMax-M3 | [####....] 45% 210k | [##......] 3% 2h32m | [##......] 2% 5d05h
 model context bar 5hr quota weekly quota

Color coding

Usage	Color
< 70%	Green
70-79%	Orange
>= 80%	Red
Waiting	DIM

Stopping

Press Ctrl+C in the terminal running the script.

Configuration

File	Purpose
`config.yaml`	Model routing rules, context thresholds, server threads
`.env` (copy from `.env.example`)	API keys

Core API keys

MINIMAX_API_KEY=your_key
ZAI_API_KEY=your_key
DEEPSEEK_API_KEY=your_key
OPENROUTER_API_KEY=your_key

Additional providers (optional)

CEREBRAS_API_KEY=
XIAOMI_API_KEY=
MOONSHOT_API_KEY=
TOGETHER_API_KEY=
OPENAI_API_KEY=
ALIBABA_API_KEY=
TENCENT_API_KEY=
INFINI_API_KEY=
BAIDU_API_KEY=
CTYUN_API_KEY=
AZURE_FOUNDRY_API_KEY=
ANTHROPIC_API_KEY=

See docs/ROUTING.md for the full config schema, and docs/ARCHITECTURE.md for how the router works internally.

Thread & Concurrency Settings

In config.yaml, server.threads controls how many concurrent requests the router handles:

server:
 threads: 3 # default: +3 per concurrent Claude Code session

Concurrent Sessions	Recommended threads
1	3 (default)
2	6
3+	9+

If you see WARNING:waitress.queue:Task queue depth is X (X >= 3 under normal load), increase threads in config.yaml to match your workload.

The router uses Waitress (production WSGI server) instead of Flask's dev server -- stable for overnight long-running tasks.

Migrating from v1.5.5

v2.0.1 has a breaking config schema. v1.5.5 stays as a tag for users who need it.

Manual migration steps:

routing.models.<key> --> routing.<key>.provider
providers.<name>.model --> routing.<key>.model (same value across keys using that provider)
Global smart_switch + context_threshold --> per-key smart_switch
Remove failover.final_fallback (if present) -- v2.0.1 chain ends at OpenRouter
Remove providers.<name>.context_limit (if present) -- now lives in MODEL_CONTEXT_LIMITS map

Development

Windows

python -m venv .venv
.venv\Scripts\python.exe -m pip install -e ".[dev]"
pytest
pyright
python -m src.main

macOS / Linux

python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"
pytest
pyright
.venv/bin/python -m src.main

License

MIT -- free for any use, including commercial.

FAQ

Q: Do I really need both plans? A: You could use only MiniMax (replaces Sonnet & Haiku) and skip GLM. But GLM is only 18ドル and gives Opus-class reasoning -- worth it for planning/architecture/review.

Q: What if I never hit the context threshold? A: Then you never pay DeepSeek beyond the initial 2ドル deposit. Your monthly stays at 28ドル (Starter) or 38ドル (M3 Plus).

Q: Why not just use OpenRouter directly? A: OpenRouter doesn't give you token-plan pricing. Our router uses monthly subscription plans (MiniMax, Z.AI) which are ~10x cheaper than pay-as-you-go API.

Q: What's new in v2.0.1? A: Model-centric config, provider registry, OpenAI format support, 114-model context window map, and 8 new providers. See the "What's new" section above.

Q: Do I need all 16 providers? A: No. Most users need 2-3: MiniMax + Z.AI + DeepSeek. The others are optional and only require API keys if you want to use them.

Q: Can I use a custom model? A: Yes. Edit config.yaml to point any routing key at any provider/model combo. You can also set provider: anthropic to pass requests directly to Anthropic's API without remapping.

Q: What if my provider isn't listed? A: If the provider supports the Anthropic Messages API format, you can add it by creating a provider entry in config.yaml. See docs/ARCHITECTURE.md for how providers work.

Folders and files

Latest commit

History

Repository files navigation

CC-TPR v2 — Claude Code Token Plan Router

Make Claude Code Work 10x More, Nonstop, Same Intelligence.

What's new in v2.0.4

Real cost breakdown (no hidden maths)

Real-time status line (what you see while using Claude Code)

Why this router exists

Referral links -- how you keep this project alive

Why M3 over Sonnet? (MiniMax)

Why GLM-5.1 over Opus? (Z.AI)

How the context fallback works (DeepSeek V4 Pro)

Supported providers

Core providers (recommended)

Also supported

Quick start

Windows

macOS / Linux

Manual Status Line (Zed, Terminal, cmd, PowerShell)

Running from Zed

Output format

Color coding

Stopping

Configuration

Core API keys

Additional providers (optional)

Thread & Concurrency Settings

Migrating from v1.5.5

Development

Windows

macOS / Linux

License

FAQ

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages