DingTalk gives you a sessionWebhook for replying inside the inbound interaction window.
So the obvious implementation is:
if sessionWebhook exists and has not expired:
send through sessionWebhook
else:
send through App API
That is what I started with.
The problem is that the timestamp is not the whole truth. A session webhook can still look fresh locally while DingTalk rejects it server-side because the session was consumed or closed.
So this code was too optimistic:
if (sessionWebhook && (!expiredAt || expiredAt > now + 15_000)) {
for (const chunk of textChunks) {
result = await this.sendViaSessionWebhook(sessionWebhook, chunk);
}
return result;
}
If that send failed, the whole delivery failed.
The fix was to treat session webhook as the cheap first attempt, not the only attempt:
if (sessionWebhook && (!expiredAt || expiredAt > now + 15_000)) {
try {
for (const chunk of textChunks) {
result = await this.sendViaSessionWebhook(sessionWebhook, chunk);
}
return result;
} catch (err) {
// fall through to App API
}
}
Then the provider falls back to the DingTalk App API:
for (const chunk of textChunks) {
result = await this.sendViaAppApi({
conversationId: conversation?.externalConversationId,
text: chunk,
robotCode: channelContext.robotCode || '',
conversationType: channelContext.conversationType || '',
senderStaffId: channelContext.senderStaffId || ''
});
}
That made delivery much more reliable.
The important lesson: a webhook expiry timestamp is not a delivery guarantee.
The third bug was hidden in the registry
This one was more subtle.
CliGate supports channel provider instances. The raw provider template in the registry is not the same thing as a started provider instance.
The started instance has settings:
clientId
clientSecret
robotCode
- mode
- runtime defaults
The raw template does not.
That matters because DingTalk App API fallback needs credentials:
const clientId = chooseSetting(this.settings, 'clientId', 'appKey');
const clientSecret = chooseSetting(this.settings, 'clientSecret', 'appSecret');
If the outbound delivery sender asks the raw registry for dingtalk, it may get a provider object with no settings. Then the session webhook fails, the App API fallback starts, and the fallback has no credentials.
So the channel manager now injects an instance-aware registry shim into both the dispatcher and the delivery sender:
const instanceAwareRegistry = {
get: (providerId, instanceId) => this.getInstance(providerId, instanceId)
};
this.outboundDispatcher.registry = instanceAwareRegistry;
this.outboundDispatcher.deliverySender?.setRegistry?.(instanceAwareRegistry);
The second line is the one that matters.
It is easy to update the dispatcher and forget that the actual send path lives one object deeper.
Runtime events now drive outbound delivery
The architecture I trust more is event-based:
runtime event
-> find channel conversations tracking that runtime session
-> format event for the channel
-> arbitrate whether to send now or suppress
-> send through provider instance
-> record delivery
The dispatcher listens to runtime session events:
this.unsubscribe = this.runtimeSessionManager.eventBus.subscribeAll((event) => {
this.handleRuntimeEvent(event).catch(() => {});
});
Then it finds conversations tracking that runtime:
const conversations = this.conversationStore.listByTrackedRuntimeSessionId(event.sessionId);
And sends through the delivery sender:
await this.deliverySender.send({
conversation: latestConversation,
channel: latestConversation.channel,
sessionId: event.sessionId,
eventSeq: event.seq,
message: {
text: formatted.fullText || formatted.text || '',
buttons: formatted.buttons || [],
session,
event
}
});
That is the boundary I wanted.
The assistant may start the work, but the runtime event owns the runtime result.
I added tests for the boring parts
The boring parts are where channel bugs usually hide.
There is a test for DingTalk falling back to the App API when the session webhook is unavailable:
assert.match(String(calls[0].url), /oauth2\/accessToken/);
assert.match(String(calls[1].url), /robot\/oToMessages\/batchSend/);
assert.deepEqual(calls[1].body.userIds, ['staff_123']);
assert.equal(calls[1].body.robotCode, 'robot_123');
There is also coverage for group conversation fallback:
assert.match(String(calls[1].url), /robot\/groupMessages\/send/);
And the delivery sender records sent and suppressed deliveries into the assistant event ledger, so debugging does not depend on guessing whether the provider was called.
That is what I want for mobile agents: not just "send a message", but an auditable delivery path.
The workflow after the fix
The flow I wanted now looks like this:
- DingTalk message comes in.
- CliGate routes it to the assistant or direct runtime path.
- Claude Code or Codex starts a runtime session.
- The DingTalk thread tracks that runtime session.
- Runtime terminal events trigger outbound delivery.
- DingTalk session webhook is tried first when useful.
- If that fails, App API fallback sends the result.
The user sees the thing that matters:
Claude Code: fixed the failing test and updated the route handler.
not just:
Task accepted.
What I learned
Mobile coding agents need stronger delivery semantics than chat demos.
It is not enough to prove that the bot can receive a message. It has to survive the whole lifecycle:
- accepted
- started
- waiting for approval
- waiting for user input
- completed
- failed
- delivered
- suppressed with a reason
And if the channel has multiple send paths, the code has to treat the first path as an optimization, not the truth.
For DingTalk, that meant:
- do not trust
sessionWebhook freshness too much
- fall back to App API when webhook send fails
- make sure the sender uses the started provider instance, not the raw provider template
- wait for runtime results even when there is only one runtime session
That is not the flashy part of building an AI coding agent.
But it is the part that decides whether you can actually trust it from your phone.
If you want to inspect the implementation, the project is here:
CliGate on GitHub
I am curious how other people are handling mobile agent delivery. Do you send one "task accepted" message, or do you wire final runtime events back into the original chat thread?