-
Notifications
You must be signed in to change notification settings - Fork 69
[Bugs] agency_mode per-request spawn times out with many MCP servers (~2.5s deadline too tight) #1012
Description
Mood: 😔
Category: Bugs
Summary
When agency_mode is enabled (server-side feature flag), the Copilot App spawns a fresh agency.exe process for every request (list_models, list_global_agents, get_session_state, telemetry, etc.). Each spawn must initialize the Copilot SDK and establish connections to all configured MCP servers before a hard ~2.5-second deadline. With 14 MCP servers configured in ~/.copilot/mcp-config.json, initialization consistently exceeds this deadline, causing every request to fail with "request cancelled". The app enters a retry loop and never recovers.
The Copilot CLI reads the same mcp-config.json and works perfectly — it initializes once as a long-lived process (takes ~10-15s on first launch) and stays ready.
Related Issues
- [Bugs] The ADO MCP keeps disconnecting. #894 — ADO MCP keeps disconnecting (likely same root cause: agency per-request spawn kills long-lived MCP connections)
- [Bugs] ADO MCP even though connected, frequently fails. #288 — ADO MCP connected but frequently fails (same pattern — HTTP MCP connections dropped between agency spawns)
- Provide more actionable details for MCP failures #587 — Provide more actionable details for MCP failures (the timeout error surface is opaque to users)
- MCP Auth failures aren't visible #104 — MCP Auth failures aren't visible (related: failure visibility)
- Model picker empty on launch: jsonrpc read loop dies on 'unexpected end of hex escape' (partial-chunk parse), cancelling list_models #836 — Model picker empty on launch due to request cancellation (closed; different root cause but same symptom:
list_modelscancelled) - [General] Internally we use Agency, and we use it on the GH CLI, it would be good to so... #54 — Request to add Agency support to the app (this was implemented, but the per-request spawn architecture has scaling issues)
Environment
| Field | Value |
|---|---|
| App version | 0.2.32 |
| OS | Windows 10.0.26100 |
| Theme | GitHub |
| Path | /chat |
| Tenure | Week 4 |
MCP Configuration (14 servers total)
- ×ばつ
type: "http"— pre-hosted MCP servers on localhost (already running, only need TCP handshake) - ×ばつ
type: "local"— spawned via npx/pnpm - ×ばつ
type: "http"— remote endpoint
Reproduction Steps
- Configure
~/.copilot/mcp-config.jsonwith 14 MCP servers (mix of http and local types) - Launch the Copilot App (v0.2.32)
- Observe the app UI shows errors / never loads models or agents
Expected Behavior
App initializes and connects to all configured MCPs (as the CLI does successfully with the same config).
Actual Behavior
Every agency.exe spawn times out at ~2.5s. The app enters a retry loop:
01:28:19.816 INFO agency_mode feature flag enabled; spawning copilot via agency wrapper
01:28:22.299 WARN Failed to initialize runtime telemetry client attempt=1 error=request cancelled
01:28:22.315 WARN resume_session failed (retry 1/3), retrying in 1s: request cancelled
01:28:22.325 ERROR failed to list models error=request cancelled
01:28:25.296 WARN resume_session failed (retry 2/3), retrying in 2s: request cancelled
01:28:29.720 WARN resume_session failed (retry 3/3), retrying in 4s: request cancelled
01:28:36.442 ERROR getSessionState: failed error=request cancelled
Timing: spawn at 19.816, first cancel at 22.299 = 2.48 seconds hard deadline. The pattern repeats for every request type — each spawning a new agency.exe that independently times out.
Workaround
Reducing mcp-config.json to ≤4 servers allows the app to start successfully — agency.exe can initialize within the deadline with fewer MCPs.
Suggested Fixes
- Keep agency alive between requests — spawn once, reuse for subsequent requests (like the CLI architecture). This is the root architectural difference.
- Increase the deadline — 2.5s is too tight for power users with many MCPs. Even pre-hosted HTTP MCPs need connection handshakes.
- Lazy-init MCP connections — don't block the initial response on all MCP connections being established. Initialize MCPs in the background.
- Separate MCP init from request readiness — allow basic requests (list_models, telemetry) to succeed before all MCPs are connected.
Logs
Full log available at ~/.copilot/logs/github-app.{pid}.log. The pattern shows 10+ agency.exe spawns in 40 seconds, all failing identically with the same ~2.5s timeout.
| Field | Value |
|---|---|
| App version | 0.2.32 |
| OS | Windows 10.0.26100 |
| Path | /chat |