[Bugs] agency_mode per-request spawn times out with many MCP servers (~2.5s deadline too tight) #1012

Open

@dantebarbieri

Description

@dantebarbieri

dantebarbieri

opened

on Jun 14, 2026

Mood: 😔
Category: Bugs

Summary

When agency_mode is enabled (server-side feature flag), the Copilot App spawns a fresh agency.exe process for every request (list_models, list_global_agents, get_session_state, telemetry, etc.). Each spawn must initialize the Copilot SDK and establish connections to all configured MCP servers before a hard ~2.5-second deadline. With 14 MCP servers configured in ~/.copilot/mcp-config.json, initialization consistently exceeds this deadline, causing every request to fail with "request cancelled". The app enters a retry loop and never recovers.

The Copilot CLI reads the same mcp-config.json and works perfectly — it initializes once as a long-lived process (takes ~10-15s on first launch) and stays ready.

Related Issues

[Bugs] The ADO MCP keeps disconnecting. #894 — ADO MCP keeps disconnecting (likely same root cause: agency per-request spawn kills long-lived MCP connections)
[Bugs] ADO MCP even though connected, frequently fails. #288 — ADO MCP connected but frequently fails (same pattern — HTTP MCP connections dropped between agency spawns)
Provide more actionable details for MCP failures #587 — Provide more actionable details for MCP failures (the timeout error surface is opaque to users)
MCP Auth failures aren't visible #104 — MCP Auth failures aren't visible (related: failure visibility)
Model picker empty on launch: jsonrpc read loop dies on 'unexpected end of hex escape' (partial-chunk parse), cancelling list_models #836 — Model picker empty on launch due to request cancellation (closed; different root cause but same symptom: list_models cancelled)
[General] Internally we use Agency, and we use it on the GH CLI, it would be good to so... #54 — Request to add Agency support to the app (this was implemented, but the per-request spawn architecture has scaling issues)

Environment

Field	Value
App version	0.2.32
OS	Windows 10.0.26100
Theme	GitHub
Path	/chat
Tenure	Week 4

MCP Configuration (14 servers total)

×ばつ type: "http" — pre-hosted MCP servers on localhost (already running, only need TCP handshake)
×ばつ type: "local" — spawned via npx/pnpm
×ばつ type: "http" — remote endpoint

Reproduction Steps

Configure ~/.copilot/mcp-config.json with 14 MCP servers (mix of http and local types)
Launch the Copilot App (v0.2.32)
Observe the app UI shows errors / never loads models or agents

Expected Behavior

App initializes and connects to all configured MCPs (as the CLI does successfully with the same config).

Actual Behavior

Every agency.exe spawn times out at ~2.5s. The app enters a retry loop:

01:28:19.816 INFO agency_mode feature flag enabled; spawning copilot via agency wrapper
01:28:22.299 WARN Failed to initialize runtime telemetry client attempt=1 error=request cancelled
01:28:22.315 WARN resume_session failed (retry 1/3), retrying in 1s: request cancelled
01:28:22.325 ERROR failed to list models error=request cancelled
01:28:25.296 WARN resume_session failed (retry 2/3), retrying in 2s: request cancelled
01:28:29.720 WARN resume_session failed (retry 3/3), retrying in 4s: request cancelled
01:28:36.442 ERROR getSessionState: failed error=request cancelled

Timing: spawn at 19.816, first cancel at 22.299 = 2.48 seconds hard deadline. The pattern repeats for every request type — each spawning a new agency.exe that independently times out.

Workaround

Reducing mcp-config.json to ≤4 servers allows the app to start successfully — agency.exe can initialize within the deadline with fewer MCPs.

Suggested Fixes

Keep agency alive between requests — spawn once, reuse for subsequent requests (like the CLI architecture). This is the root architectural difference.
Increase the deadline — 2.5s is too tight for power users with many MCPs. Even pre-hosted HTTP MCPs need connection handshakes.
Lazy-init MCP connections — don't block the initial response on all MCP connections being established. Initialize MCPs in the background.
Separate MCP init from request readiness — allow basic requests (list_models, telemetry) to succeed before all MCPs are connected.

Logs

Full log available at ~/.copilot/logs/github-app.{pid}.log. The pattern shows 10+ agency.exe spawns in 40 seconds, all failing identically with the same ~2.5s timeout.

Field	Value
App version	0.2.32
OS	Windows 10.0.26100
Path	/chat

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Fields

Give feedback

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugs] agency_mode per-request spawn times out with many MCP servers (~2.5s deadline too tight) #1012

Description

Summary

Related Issues

Environment

MCP Configuration (14 servers total)

Reproduction Steps

Expected Behavior

Actual Behavior

Workaround

Suggested Fixes

Logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions