Copied to Clipboard
That is simple on purpose.
The first tier is the assistant's intended home. The later tiers are not magic discovery. They are explicit ordered fallbacks the user can inspect in the UI.
That matters because fallback behavior should be explainable.
If an assistant changes models under pressure, I want to know exactly why.
A circuit breaker made the assistant feel much less random
Fallback chains are not enough if you keep retrying a dead tier over and over.
So the assistant LLM client keeps breaker state per tier and skips sources that are currently in cooldown:
for (const descriptor of chain) {
const tierKey = tierKeyFor(descriptor);
if (this._breaker.shouldSkip(tierKey)) continue;
const candidate = await resolveCredential(descriptor, {
defaultChatGptModel: this.defaultChatGptModel,
defaultClaudeModel: this.defaultClaudeModel
});
if (!candidate) continue;
candidates.push({ ...candidate, tierKey });
}
And when a call fails, the tier records failure instead of pretending the error was just bad luck:
const breakerState = this._breaker.recordFailure(source.tierKey);
logger.warn(`[Supervisor] tier failed | tier=${source.tierKey} | breaker=${breakerState}`);
That changed the experience more than I expected.
Before, the assistant could feel inconsistent in a way users interpret as "the prompt changed" or "the model got weird."
After this change, the behavior became much more operational:
- try the primary source
- skip tripped tiers
- fall through to explicit backups
- expose the health state in the dashboard
That is a better failure story for a product surface.
The UI finally has something honest to show
This was another reason I wanted the routing chain to be explicit.
Once the backend exposes:
- the current primary
- ordered fallbacks
- resolved source
- breaker state
- last used tier
the settings page can stop being a dead form and start being an inspection tool.
The assistant page now has controls for:
- primary model source
- per-tier model selection
- up to three fallbacks
- breaker threshold and cooldown
- test-binding checks
- tier health and last-used status
That is exactly the kind of visibility I wanted when debugging "why did the assistant answer from this provider instead of that one?"
I did not want the assistant to silently test with live requests
There is a small route detail here that I like because it keeps the UI honest.
The binding test endpoint validates whether a descriptor resolves, but it does not fire an actual LLM request:
const result = await describeBinding({ type: body.type, id: body.id });
return res.json({ success: result.ok, ...result });
That means the user gets a fast answer to:
"is this binding even real?"
without turning the settings screen into an accidental prompt runner.
It is a small boundary, but product assistants need that kind of boundary.
The part I trust most is the migration and route coverage
I can write all the assistant architecture docs I want, but the thing that makes me trust this change is the route-level test coverage.
For example, there are tests that pin the new primary field:
assert.deepEqual(res._body.assistantAgent.boundModelSource, {
type: 'api-key',
id: 'key-primary',
model: 'gpt-5.4'
});
And tests that make sure clearing bindings is respected:
assert.equal(res._body.assistantAgent.boundModelSource, null);
assert.equal(res._body.assistantAgent.boundCredential, null);
Those are the kinds of tests that prevent a future "helpful migration" from quietly breaking the operator's intent again.
What changed in how I think about product assistants
I used to think the important part was the prompt and the docs grounding.
Those matter.
But once the assistant becomes part of the product, routing discipline matters just as much.
If the assistant is meant to be:
- predictable
- inspectable
- recoverable
- configurable without guesswork
then it cannot just borrow whatever account or API key happened to win a broader routing race.
It needs its own routing chain.
The pattern I would reuse
If you are adding a product assistant to an existing app with multiple model sources, I think this is the safer progression:
- give the assistant its own explicit primary binding
- bind to a concrete source plus model, not just a source type
- mark explicit user configuration so legacy migration cannot override it
- add ordered fallbacks
- add breaker state so failures do not loop forever
- expose the whole chain in the UI
That is a lot less glamorous than "ship an assistant."
But it is the difference between a demo assistant and one that operators can actually live with.
If you want to inspect the implementation, the project is here:
CliGate on GitHub
I am curious how other people are handling this. Does your product assistant have its own routing identity, or is it still borrowing the same model path as ordinary chat?