"My Product Assistant Kept Borrowing the Wrong Model. So I Gave It Its Own Routing Chain"

DEV Community

object' && entry.type && entry.id) { chain.push(entry); } } }

That is simple on purpose.

The first tier is the assistant's intended home. The later tiers are not magic discovery. They are explicit ordered fallbacks the user can inspect in the UI.

That matters because fallback behavior should be explainable.

If an assistant changes models under pressure, I want to know exactly why.

A circuit breaker made the assistant feel much less random

Fallback chains are not enough if you keep retrying a dead tier over and over.

So the assistant LLM client keeps breaker state per tier and skips sources that are currently in cooldown:

for (const descriptor of chain) {
 const tierKey = tierKeyFor(descriptor);
 if (this._breaker.shouldSkip(tierKey)) continue;
 const candidate = await resolveCredential(descriptor, {
 defaultChatGptModel: this.defaultChatGptModel,
 defaultClaudeModel: this.defaultClaudeModel
 });
 if (!candidate) continue;
 candidates.push({ ...candidate, tierKey });
}

And when a call fails, the tier records failure instead of pretending the error was just bad luck:

const breakerState = this._breaker.recordFailure(source.tierKey);
logger.warn(`[Supervisor] tier failed | tier=${source.tierKey} | breaker=${breakerState}`);

That changed the experience more than I expected.

Before, the assistant could feel inconsistent in a way users interpret as "the prompt changed" or "the model got weird."

After this change, the behavior became much more operational:

try the primary source
skip tripped tiers
fall through to explicit backups
expose the health state in the dashboard

That is a better failure story for a product surface.

The UI finally has something honest to show

This was another reason I wanted the routing chain to be explicit.

Once the backend exposes:

the current primary
ordered fallbacks
resolved source
breaker state
last used tier

the settings page can stop being a dead form and start being an inspection tool.

The assistant page now has controls for:

primary model source
per-tier model selection
up to three fallbacks
breaker threshold and cooldown
test-binding checks
tier health and last-used status

That is exactly the kind of visibility I wanted when debugging "why did the assistant answer from this provider instead of that one?"

I did not want the assistant to silently test with live requests

There is a small route detail here that I like because it keeps the UI honest.

The binding test endpoint validates whether a descriptor resolves, but it does not fire an actual LLM request:

const result = await describeBinding({ type: body.type, id: body.id });
return res.json({ success: result.ok, ...result });

That means the user gets a fast answer to:

"is this binding even real?"

without turning the settings screen into an accidental prompt runner.

It is a small boundary, but product assistants need that kind of boundary.

The part I trust most is the migration and route coverage

I can write all the assistant architecture docs I want, but the thing that makes me trust this change is the route-level test coverage.

For example, there are tests that pin the new primary field:

assert.deepEqual(res._body.assistantAgent.boundModelSource, {
 type: 'api-key',
 id: 'key-primary',
 model: 'gpt-5.4'
});

And tests that make sure clearing bindings is respected:

assert.equal(res._body.assistantAgent.boundModelSource, null);
assert.equal(res._body.assistantAgent.boundCredential, null);

Those are the kinds of tests that prevent a future "helpful migration" from quietly breaking the operator's intent again.

What changed in how I think about product assistants

I used to think the important part was the prompt and the docs grounding.

Those matter.

But once the assistant becomes part of the product, routing discipline matters just as much.

If the assistant is meant to be:

predictable
inspectable
recoverable
configurable without guesswork

then it cannot just borrow whatever account or API key happened to win a broader routing race.

It needs its own routing chain.

The pattern I would reuse

If you are adding a product assistant to an existing app with multiple model sources, I think this is the safer progression:

give the assistant its own explicit primary binding
bind to a concrete source plus model, not just a source type
mark explicit user configuration so legacy migration cannot override it
add ordered fallbacks
add breaker state so failures do not loop forever
expose the whole chain in the UI

That is a lot less glamorous than "ship an assistant."

But it is the difference between a demo assistant and one that operators can actually live with.

If you want to inspect the implementation, the project is here:

CliGate on GitHub

I am curious how other people are handling this. Does your product assistant have its own routing identity, or is it still borrowing the same model path as ordinary chat?