SELECT
trace_id,
call_id,
duration_ms AS handoff_ms
FROM spans
WHERE name = 'voice.handoff.vad_to_asr'
AND duration_ms > 250
ORDER BY handoff_ms DESC
LIMIT 50;
That query returns nothing on a normal day and lights up the instant the pool starts contending. I wired it to an alert on the handoff span's p95, not the end-to-end p95, because the end-to-end p95 is exactly the number that lied to me on June 3rd.
The pool fix was unglamorous. Pre-warm the ASR streaming connections, size the pool for burst concurrency instead of average, and keep the connections alive between turns instead of opening lazily. Handoff p95 went from 1400ms on the bad call down to 70ms steady-state. The dead air was gone the same afternoon I shipped the span, because the span told me precisely where to put the fix.
What this does NOT solve
Instrumenting the handoff makes the gap visible. It does not make your infrastructure fast. A few honest limits.
It does not fix jitter under load on its own. The span tells you the handoff is slow, but if your pool, your event loop, or your GC is the bottleneck, you still have to go fix that. The span is a flashlight, not a wrench.
It does nothing about provider-side queueing you cannot see. When ElevenLabs or your ASR vendor queues your request on their side, your client-side span measures the wait but cannot attribute it past the boundary. You will know that you waited, not why the provider made you wait. For that you need their status, their rate-limit headers, sometimes a support ticket.
And it will not catch every gap automatically. I added the VAD-to-ASR span because that is where this fire was. There are other handoffs (ASR-to-LLM, LLM-to-TTS, barge-in cancellation) and each one needs its own span if you want to see its gap. Instrument the ones that hurt first.
Lesson: Instrument the handoffs, not just the calls. A green waterfall of short, correct spans can still add up to a customer saying "hello?" into silence, because the damage is the time between the bars, and a trace only shows you the bars you drew. The day I stopped trusting the summed p95 and started putting spans on the gaps is the day the dead air stopped paging me. If you run voice agents, go find your turn-end-to-ASR-start handoff right now, wrap it in a span, and alert on that span alone. It is the cheapest 1.4 seconds you will ever buy back.