Single-Flight LLM Calls: Coalesce 50 Concurrent Identical Requests Into One API Call

DEV Community

# Only one LLM call happens for each unique messages/model combo response = await coalesce.call( fn=anthropic_client.messages.create, model="claude-sonnet-4-6", messages=messages, max_tokens=512, ) return response.content[0].text # Test: 10 concurrent requests for the same date async def test_coalescing(): results = await asyncio.gather(*[ generate_daily_summary(f"user-{i}", "2026年05月24日") for i in range(10) ]) # All results are identical (same LLM response) assert all(r == results[0] for r in results) print(f"10 requests coalesced into 1 API call")

Sibling Libraries

Library	What it solves
`llm-cache-mem`	Cross-time deduplication (LRU+TTL for non-concurrent identical calls)
`agentidemp-py`	Request-level idempotency for non-concurrent duplicate agent runs
`llm-rate-limit-bucket`	Token-bucket rate limiter for outbound LLM calls
`llm-retry`	Exponential backoff retry when calls fail
`token-budget-pool`	Thread-safe concurrent token/USD budget tracking

The concurrency stack: llm-batch-coalesce for in-flight dedup, llm-cache-mem for cross-time dedup, llm-rate-limit-bucket for rate limiting, llm-retry for failure recovery.

What's Next

Waiter count metrics: coalesce.stats() returning how many times a request was coalesced (waiter count per key), how much the coalescing saved in estimated API cost, and the current in-flight count. Makes the value of the coalescer visible in dashboards.

Configurable key function: LLMCoalesce(key_fn=...) for cases where you want to coalesce on a subset of the kwargs (e.g., coalesce on messages content only, ignoring model differences). Some use cases allow sharing responses across model variants.

Sync version: SyncLLMCoalesce using threading.Event instead of asyncio.Future for sync contexts. The async version requires an event loop; the sync version would work in traditional multi-threaded WSGI apps.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.