AI
Hands-on guides to LLMs, agents, prompt engineering, and the AI tools I run every day for real work, not demos.
Cheap AI Tokens Are a Scam Where Your Prompts Are the Product
Cheap AI API resellers undercut official prices by 70 to 97 percent because the discount is not the product: your prompts are. They log every request to resell as training data, route you to weaker models, and run on stolen-card accounts. A CISPA Helmholtz audit caught silent model swapping, but the harvested logs are the real margin.
Key Takeaways
- A 90 percent discount on frontier AI is funded by reselling your prompts.
- Proxies can send an “Opus” request to a cheaper model and relabel it.
- Many reseller accounts come from stolen cards and faked identity checks.
- Pointing a coding agent at an unknown API host hands a stranger your machine.
- Official APIs and zero-retention gateways are cheap enough to skip the scam.
Why is a Claude or GPT API 90% cheaper from a reseller?
A frontier model has a hard cost floor. GPU time per token is a real expense, and the official provider already prices it close to the bone. So a reseller charging one tenth of that loses money on every call, unless something else pays the bill. The discount cannot come from being smarter about compute.
Pinterest's MCP Deployment: 66,000 Monthly Invocations and 7,000 Engineering Hours Saved
Pinterest’s Model Context Protocol rollout hits 66,000 calls per month across 844 active users. It’s the most detailed public case study of MCP at scale. A central registry, two-layer auth, safety reviews, and human checkpoints set this apart from a prototype. The payoff: about 7,000 engineering hours saved each month.
The story comes from Pinterest’s engineering blog post in March 2026 and later coverage by InfoQ . For any team weighing MCP for live use, this rollout is a solid guide.
Claude Code Remote Agents: Dispatch, Scheduled Tasks, and /loop Explained
Claude Code now ships four ways to run agents remotely: Dispatch, Remote Control, Scheduled Tasks, and /loop. Pick the wrong one and you either over-build a simple polling job or under-build something that needs real persistence. Each works at a different layer of the stack. Each has its own lifecycle, infrastructure needs, and rules for what survives a closed terminal or a sleeping laptop.
Dispatch: Send Tasks from Your Phone to Your Desktop
Dispatch launched on March 17, 2026 as a research preview inside Claude Cowork. Open the Claude mobile app, describe a task, and Dispatch routes it to your Claude Desktop instance on your dev machine. Claude Code runs the task locally with your file system, MCP servers, skills, connectors, and any other tools you’ve set up. The result comes back to your phone.
AI Coding Benchmarks in 2026: Why the Leaderboard You Pick Decides the Winner
The SWE-bench Verified leaderboard in June 2026 is led by OpenAI’s GPT-5.5 at 88.7%, with Claude Opus 4.7 a step behind at 87.6% and GPT-5.3-Codex at 85.0%. Anthropic’s June flagships, Opus 4.8 and the new Fable 5, ship as the current top Claude models but have not landed on the public board yet. Pick a different benchmark and the order flips. On SWE-bench Pro, Claude Opus 4.7 leads at 64.3%. On Terminal-Bench 2.0 , Codex CLI paired with GPT-5.5 tops the chart at 82.0%, while the cheaper, faster Gemini 3.5 Flash hit 76.2% on the newer 2.1 set with output about 4x faster. LiveCodeBench favors Google. There is no single best AI coding model. There is only a best model for the kind of task you care about, and the agent scaffold around that model can shift scores by several points.
The Chinese Open-Weight Coding Stack in 2026: Is Kimi K2.7 Real?
The Chinese open-weight coding stack leads several benchmarks in 2026, but the rankings disagree. Kimi K2.7-Code just landed, yet auditors call it more honest than capable, not better than K2.6. No single model wins outright, so the smart play is a hybrid: plan with Claude, code with Kimi for about 39ドル a month.
Key Takeaways
- No single Chinese model wins; the leader depends on your task and budget.
- Kimi K2.7-Code looks more honest than K2.6, not clearly smarter.
- Benchmark lists and real-usage data disagree on who leads.
- Kimi K2.6 burns about twice the thinking tokens of K2.5.
- Most heavy users plan with Claude and code with Kimi to cut cost.
What is the Chinese open-weight coding stack in 2026?
The Chinese open-weight coding stack is the group of open-license models built mainly by Chinese labs for agentic software work. The roster includes Kimi K2.6 and the new K2.7-Code from Moonshot, GLM 5.1 from z.ai, Qwen3-Coder-Next from Alibaba, DeepSeek V4-Pro and V4-Flash, MiniMax M3, and Xiaomi’s MiMo V2.5. All ship under Apache, MIT, or near-equivalent open terms.
Fable 5 vs Opus 4.8: Is It Worth It? The Reddit Verdict
Reddit users who ran both Fable 5 and Opus 4.8 during the free window say Fable feels smarter on first-shot completeness, debugging, and vision, but the gain is uneven and the token burn is real. On the MineBench head-to-head it averaged 18m04s per build versus Opus 4.8’s 24m48s, and cost 54ドル.93 versus 41ドル.52 across 15 builds despite Fable’s 2x price.
Key Takeaways
- Reddit’s hands-on take: Fable 5 nails the task on the first try more often than Opus 4.8.
- On MineBench, Fable ran faster and used fewer tokens, costing about 30% more despite 2x pricing.
- The loudest complaint isn’t quality, it’s token burn that drains Max and Pro limits fast.
- One user’s Subaru misfire: Opus punted, Fable pulled video frames and audio to find the cause.
- Skeptics note Opus often does the same once you prompt it the way Fable figured out itself.
This verdict comes from seven old.reddit.com threads across r/claude , r/ClaudeAI , and r/ClaudeCode , captured during the launch window. One caveat up front: these are enthusiast subs, and most posters were mid free-trial. So the sentiment skews positive, and single-user stories are anecdotes, not proof. Where the crowd disagreed, the dissent is here too.