What Salesforce's 20,000 AI Agent Deployments Teach a Solo Builder - DEV Community

Skip to content

Powered by Algolia

Log in Create account

DEV Community

Copied to Clipboard

What are the three anti-patterns that degrade agents?

First: over-reasoning deterministic workflows. If you can flowchart the logic, it belongs in code. Salesforce built Agent Script, a TypeScript framework that mixes deterministic control flow with LLM reasoning, because asking a model to re-derive an if-else chain on every run is slow, expensive, and occasionally wrong. You do not need their framework. You need the rule: flowchart it, then script it. Save the model for the parts that are genuinely ambiguous.

Second: prompting harder instead of encoding policies. Writing NEVER and ALWAYS in caps does not reliably constrain a model. Salesforce found business rules have to execute independently of model reasoning. This one matters most for small shops, because prompting harder is free and feels like progress. If a rule actually matters, enforce it in code that runs whether or not the model cooperates. A refund cap belongs in the payment function, not in paragraph four of the system prompt.

Third: poor context engineering. One e-commerce team in the writeup cut an order API response from 100K tokens to 2K by returning only the relevant fields. The agent got faster and more accurate at the same time. That is the detail worth tattooing somewhere: less context made it better, not just cheaper. Dumping a whole API response into the prompt is the default, and the default is wrong.

How do you know an agent is actually working?

Salesforce measures Agentic Work Units, meaning actual task completion. For support agents they track containment rate: cases resolved without human follow-up. Outcomes, not activity.

I learned a version of this the hard way. A scheduled agent can exit zero every night and produce nothing. Green checks lie. The fix is to check the declared output, not the exit code. Did the file appear, did the post go live, did the ticket close. Whatever your equivalent of containment rate is, measure that.

Their post-launch triage is also worth stealing. Issues get split four ways: tone or brand drift means fix the prompts, logic errors mean fix the tools or convert that step to a script, data quality problems get routed to whoever owns the source, and coverage gaps mean expand scope or escalate cleanly. Four buckets, four different fixes. Most solo builders treat every failure as a prompt problem. Most failures are not.

What does this mean if you're not Salesforce?

Salesforce has platform teams to absorb the post-launch 90%. You have you. That changes the build order, not the lessons.

Move deterministic logic out of the loop first. It is the cheapest win: fewer tokens, fewer surprises, faster runs. Then encode your real rules as code-level checks the model cannot talk its way past. Then cut your context down to what the task needs. Each of these makes the after-launch grind smaller, which at solo scale is the difference between a fleet you maintain and a fleet that quietly rots.

And put hard runtime limits on every agent before it touches production. The deployments in the writeup degrade in ways nobody predicted in the demo, and at 20,000 deployments Salesforce can eat the bad days. One runaway retry loop on your side is your whole margin. That is the exact surface I built AgentGuard for: per-agent budget caps, token limits, and rate limits enforced at runtime, not in the prompt. It is a pip install, agentguard, and it takes minutes to wire in. Start there: https://bmdpat.com/tools/agentguard

Originally published on bmdpat.com. I run a one-person AI agent company and write about what actually works.

Want these in your inbox? Subscribe to the newsletter - no spam, unsubscribe anytime.

BMD HODL (111 Part Series)

1 One Person, 12 Agents, a Holding Company 2 When Tokens Cost 12 Cents Per Million, The Bottleneck Isn't Cost. It's Control. ... 107 more parts... 3 Your AI Agent's MCP Server Is a Security Hole 4 AI Chose Nukes 95% of the Time. Here's What That Means for Your Agents. 5 AI security is now a token-burning contest. Who's watching the bill? 6 What AI-native startups actually look like in 2026 (and I'm running one from Tennessee) 7 The flat-fee era is over. How to control your AI agent costs in 2026. 8 AI Agent Memory: How It Works and When You Actually Need It 9 Claude Code caching is eating your budget. Here's what's happening. 10 GPU Prices Up 48% in Two Months. I Run LLMs in My Garage. 11 When to Replace Your AI Agent With a Script 12 Your Agent Project Might Be in the Wrong Quadrant 13 Anthropic's Advisor Tool Is the Cost-Split Pattern You Should Already Be Running 14 We Built Martin Fowler's Feedback Flywheel Before He Published It 15 Three Studies This Month Changed Everything About AI Agent Safety 16 Nation-State Hackers Are Targeting Your AI Agent Keys 17 Aymo AI Pricing Plans 2026: Free vs 39ドル/mo — Worth It? 18 If AI agents can spend money, who's holding the credit card? 19 HTTP 402: the payment status code the web ignored for 33 years 20 A2A Protocol: How AI Agents Talk to Each Other 21 Why API keys break for autonomous AI agents 22 I built a memory API that AI agents can pay for 23 PostHog Rebuilt Their AI Architecture Twice. Here Are the 5 Rules They Learned. 24 Meta Burned 60 Trillion Tokens in 30 Days. Here Is How to Not Be Meta. 25 Prompt Injection Attacks on AI Agents: What Business Owners Need to Know 26 OpenAI's guardrails don't control costs. Here's the gap. 27 agent-sre on PyPI: what SRE for AI agents actually means 28 9 Out of 428 LLM API Routers Are Injecting Malicious Code Right Now 29 AI Chose Nukes 95% of the Time. Here's What That Means for Your Agents. 30 Cloudflare agents can now buy domains. The case for runtime spend rails just got concrete. 31 How to Hire an AI Agent Developer (2026 Guide) 32 Multi-Agent AI for Business: Do You Need It in 2026? 33 llama.cpp n_gpu_layers Explained: -1, 0 & VRAM Guide 34 Raspberry Pi 5 Local Voice AI: What Works in 2026 35 MCP vs Skills: a practical decision guide for builders 36 How a 9-Person Startup Replaced Its Dev Team With AI 37 7% of vibe-coded apps ship with wide-open databases 38 When a 100ドルB company burns its 2026 AI budget by April 39 Your AI Agent Will Eventually Delete Prod 40 6 Agent Patterns From Claude Code's Leaked Source 41 The Async Automation Playbook: How to Eliminate Manual Work Without Meetings 42 How I Let an AI Agent Run 100 ML Experiments Overnight on a 500ドル GPU 43 What Is MCP? The Protocol That Makes AI Agents Actually Useful for Business 44 Why 88% of AI Agent Pilots Fail (And How to Beat It) 45 Computer Use Is 45x More Expensive Than APIs. Here's When To Use Each. 46 Custom vs. Off-the-Shelf AI Agents for Small Business 47 n8n vs Make vs Custom Scripts: When to Use What for AI Workflow Automation 48 The Future of AI and Next.js 49 One Agent Skill, Three Registries: PyPI, Claude, and skills.sh 50 April 2026: Every AI Subscription Plan Broke for Builders 51 GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026) 52 BMD HODL devlog - week of 2026年05月03日 53 BMD HODL devlog - week of 2026年04月26日 54 Localmaxxing isn't theory. Here's what my 3-GPU rig actually does. 55 Enterprise AI just shifted: Claude +128%, OpenAI -8%. What it means if you're building. 56 AI software runs on 17% margins. SaaS runs on 70%. The token bill is the problem. 57 An AI Agent in Sweden Ordered 6,000 Napkins. Here's the 12 Lines of Python That Would Have Stopped It. 58 I gave an autotrader 360ドル and 30 days. I am not adding live money yet. 59 BMD HODL devlog - week of 2026年05月17日 60 Decoding the AI Summer: Building Accountable Agents for the User 61 Securing Your AI Agents: Essential Practices for On-Device Automation 62 The Age of Accountable Agents: Building Trust in Your AI Automation 63 Your AI, Your Rules: Engineering Agents for Digital Freedom 64 Designing for Agency: Building Trustworthy AI Agents in a Shifting World 65 The AI Whirlwind: Why Your Local Agent Matters More Than Ever 66 Why Starbucks Killed Its AI Inventory Tool After 9 Months 67 AI Jobs vs Entry-Level Work: A Reality Check for Builders 68 Microsoft Told Engineers to Ease Off Claude Code 69 auth.md: How AI Agents Will Sign Your Users Up 70 Claude Opus 4.8: What Actually Changed for AI Agent Builders 71 The Silent-Success Trap: Your Monitoring Is Green and You Still Shipped Nothing 72 AI-powered hacking went industrial. Here's what changes if you run agents. 73 Token budget wars are starting. Most companies are paying for vibes. 74 Your Cron Jobs Lie - Why I Built an Outcome Checker 75 When Your Blog Repair Loop Fails 23 Times, Stop Repairing 76 I made my blog API reject its own writer 77 llama.cpp ngl: when -ngl 99 still runs on your CPU 78 When Not to Use an AI Agent 79 What GitHub Copilot Users Wish They Had a Week Ago 80 What Anthropic's MITRE ATT&CK Report Means for Teams Running AI Agents 81 When JPMorgan Turns On AI Bank-Wide, Who Controls the Bill? 82 Your AI Agent's Retry Loop Is a Cost Bug Waiting to Happen 83 What Uber's 1,500ドル/Developer AI Cap Tells You About Your Own Bill 84 llama.cpp Multi-GPU: Splitting a Model Across Cards with --tensor-split 85 How to Tune --n-gpu-layers for Your VRAM Budget 86 Which GGUF Quant Should You Actually Pick? Q4 vs Q5 vs Q6 vs Q8 (2026) 87 How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026) 88 Your AI agent doesn't need memory. It needs a file. 89 GGUF Quantization and VRAM: How to Pick Q4, Q5, or Q8 for Your GPU (2026) 90 Stop Telling People You Have 11 AI Agents 91 How to Pick a GGUF Quant Level for Your VRAM Budget 92 Agentic coding moved my bottleneck to code review 93 How to Close the AI Agent Cost Gap at the Call Site 94 AI Coding Assistant Pricing in 2026: Copilot vs Cursor vs Claude Code 95 57-71% of AI Agents Leak Data Between Users. Here's the Fix. 96 When JPMorgan's AI bill goes up, who controls it? 97 Anthropic's IPO and the 40% Cost-Savings Gap: Why Your Spend Cap Matters More Now 98 VRAM Calculator: Estimate Local LLM Requirements 99 57-71% of AI agents leak data between users. Here's what to do. 100 What Salesforce's 20,000 AI Agent Deployments Teach a Solo Builder 101 Anthropic Writes 80% of Its Code with Claude 102 Missing AI agent cost data is not zero 103 A self-healing system can't heal an empty queue 104 Give Your AI Agents an Append-Only Event Log 105 Your AI Agent Says "Done." Make It Prove It. 106 AI Agent Memory in 2026: How It Works and When to Use It 107 What Anthropic's MITRE ATT&CK report means for solo AI builders 108 AI Agent Memory: What Actually Works in 2026 109 How to Run Local LLM Verifier Loops on Owned Hardware 110 Use Owner Gates and AgentGuard to Keep AI Agents Moving 111 How to Make a Local QLoRA Starter Fail Safely

Top comments (2)

Subscribe

pic

Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

malloryhaigh profile image

PHP Lifer. Platform (Engineering) Therapist. Crazy Horse Lady. Enterprise AI adoption coach. Agentic SDLC fangirl.

Joined

Jun 8, 2026

Copy link

The three anti-patterns you've described are all pointing at the same core issue: these are platform problems, not prompt problems. Deterministic logic in code, policies enforced outside the model, context shaped before it hits the agent...this is infrastructure design, not fiddling with agents or tuning the model. The reason 90% of the work ends up happening post-launch is that most teams ship an agent with no stable platform underneath it, then spend months hand-building what the platform should have provided from day one. What Salesforce figured out at 20,000 deployments, solo builders and enterprise teams are both learning the same way: production pain teaches (painful) lessons. I'd say the methodology that makes this systematic rather than reactive is platform engineering, adapted for the agentic world.

joel_horvath_0c470c6260a9 profile image

Joined

Jun 17, 2026

Copy link

We’re no longer just "building software" — we’re designing systems that decide what to trust before they produce output.
That’s why:
context is the real cost
demos fail in production
agents break after launch
code vs AI boundaries matter more than models
The winner isn’t who builds faster — it’s who controls uncertainty better.

Code of Conduct • Report abuse

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink.

Hide child comments as well

For further actions, you may consider blocking this person and/or reporting abuse

Joined

Feb 10, 2026

More from Patrick Hughes

Use Owner Gates and AgentGuard to Keep AI Agents Moving

#aiagents #agentops #asyncworkflows #onepersoncompany

How to Run Local LLM Verifier Loops on Owned Hardware

#localllm #aiagents #verifierloops #llamacpp

AI Agent Memory: What Actually Works in 2026

#aiagents #memory #agentops #onepersoncompany

💎 DEV Diamond Sponsors

Thank you to our Diamond Sponsors for supporting the DEV Community

Google AI - Official AI Model and Platform Partner

Google AI is the official AI Model and Platform Partner of DEV

Neon - Official Database Partner

Neon is the official database partner of DEV

Algolia - Official Search Partner

Algolia is the official search partner of DEV

DEV Community — A space to discuss and keep up software development and manage your software career

Home
DEV Challenges
DEV++
Videos
DEV Education Tracks
DEV Help
Advertise on DEV
Organization Accounts
DEV Showcase
About
Contact
Free Postgres Database
DEV Shop
MLH

Code of Conduct
Privacy Policy
Terms of Use

Built on Forem — the open source software that powers DEV and other inclusive communities.

Made with love and Ruby on Rails. DEV Community © 2016 - 2026.

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

AltStyle によって変換されたページ (->オリジナル) / アドレス: モード: