What are the three anti-patterns that degrade agents?
First: over-reasoning deterministic workflows. If you can flowchart the logic, it belongs in code. Salesforce built Agent Script, a TypeScript framework that mixes deterministic control flow with LLM reasoning, because asking a model to re-derive an if-else chain on every run is slow, expensive, and occasionally wrong. You do not need their framework. You need the rule: flowchart it, then script it. Save the model for the parts that are genuinely ambiguous.
Second: prompting harder instead of encoding policies. Writing NEVER and ALWAYS in caps does not reliably constrain a model. Salesforce found business rules have to execute independently of model reasoning. This one matters most for small shops, because prompting harder is free and feels like progress. If a rule actually matters, enforce it in code that runs whether or not the model cooperates. A refund cap belongs in the payment function, not in paragraph four of the system prompt.
Third: poor context engineering. One e-commerce team in the writeup cut an order API response from 100K tokens to 2K by returning only the relevant fields. The agent got faster and more accurate at the same time. That is the detail worth tattooing somewhere: less context made it better, not just cheaper. Dumping a whole API response into the prompt is the default, and the default is wrong.
How do you know an agent is actually working?
Salesforce measures Agentic Work Units, meaning actual task completion. For support agents they track containment rate: cases resolved without human follow-up. Outcomes, not activity.
I learned a version of this the hard way. A scheduled agent can exit zero every night and produce nothing. Green checks lie. The fix is to check the declared output, not the exit code. Did the file appear, did the post go live, did the ticket close. Whatever your equivalent of containment rate is, measure that.
Their post-launch triage is also worth stealing. Issues get split four ways: tone or brand drift means fix the prompts, logic errors mean fix the tools or convert that step to a script, data quality problems get routed to whoever owns the source, and coverage gaps mean expand scope or escalate cleanly. Four buckets, four different fixes. Most solo builders treat every failure as a prompt problem. Most failures are not.
What does this mean if you're not Salesforce?
Salesforce has platform teams to absorb the post-launch 90%. You have you. That changes the build order, not the lessons.
Move deterministic logic out of the loop first. It is the cheapest win: fewer tokens, fewer surprises, faster runs. Then encode your real rules as code-level checks the model cannot talk its way past. Then cut your context down to what the task needs. Each of these makes the after-launch grind smaller, which at solo scale is the difference between a fleet you maintain and a fleet that quietly rots.
And put hard runtime limits on every agent before it touches production. The deployments in the writeup degrade in ways nobody predicted in the demo, and at 20,000 deployments Salesforce can eat the bad days. One runaway retry loop on your side is your whole margin. That is the exact surface I built AgentGuard for: per-agent budget caps, token limits, and rate limits enforced at runtime, not in the prompt. It is a pip install, agentguard, and it takes minutes to wire in. Start there: https://bmdpat.com/tools/agentguard
Originally published on bmdpat.com. I run a one-person AI agent company and write about what actually works.
Want these in your inbox? Subscribe to the newsletter - no spam, unsubscribe anytime.