[フレーム]
BT

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

We protect your privacy.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In
or

Don't have an InfoQ account?

Register
  • Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
  • Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
  • Save articles and read at anytimeBookmark articles to read whenever youre ready.

Topics

Choose your language

InfoQ Homepage News Cloudflare Achieves 99.99% Warm Start Rate for Workers with 'Shard and Conquer' Consistent Hashing

Cloudflare Achieves 99.99% Warm Start Rate for Workers with 'Shard and Conquer' Consistent Hashing

Oct 10, 2025 3 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.
Get in touch
Listen to this article - 0:00
Audio ready to play
0:00
0:00

Recently, Cloudflare introduced a new technique called "Shard and Conquer" for reducing cold starts in its serverless platform, Cloudflare Workers. It leverages a consistent hash ring to intentionally coalesce traffic for individual Workers onto a single "shard server" within a data center. With this new technique, the company has reduced the cold start rate by a factor of 10, achieving a sustained warm request rate for 99.99% of requests.

The technique marks the second significant evolution in cold start mitigation for the company, as the initial technique, pre-warming during the TLS handshake, began to fail for increasingly complex applications. Hence, Cloudflare decided to relax several platform limits in response to user demand for running larger, more complex applications by increasing script size from 1MB to 10MB (for paying users) and startup CPU time from 200ms to 400ms.

Yet by accommodating richer applications, these increases simultaneously lengthened the Worker's cold start duration, causing it to frequently exceed the time of a modern TLS 1.3 handshake. This meant the cold start time could no longer be hidden entirely from the end user, requiring a new approach to minimize the frequency of cold starts.

The core motivation for this optimization lies in the serverless value proposition, as one Hacker News commentator observed:

Because attractiveness of Workers/Lambdas/Functions is whole 'write simple amount of code and pay pennies to run it.' Downside is cold starts, twisting yourself into knots you will do at scale to make them work, and vendor lock-in.

To solve the cold start frequency problem, Cloudflare borrowed a key technique from its own CDN HTTP cache: consistent hashing.

Previously, a request arriving on any server could trigger a redundant cold start, even if a warm Worker instance already existed on a nearby machine. This resulted in high cold start rates for low-volume Workers, whose instances were frequently evicted across multiple servers due to low, scattered traffic.

The new architecture works as follows:

  • A Worker's script ID is mapped onto a consistent hash ring shared by all servers in a data center.
  • Subsequently, this map dictates a single, primary "shard server" that is responsible for running a specific Worker instance.
  • As a result, all requests for that Worker are routed to the shard server, keeping the Worker instance warm indefinitely and reducing memory usage across the cluster by avoiding redundant instances.

A diagram of a pie chartAI-generated content may be incorrect.

(Source: Cloudflare blog post)

A crucial engineering challenge for this sharding model is load shedding. An individual Worker instance could still be overwhelmed by a sudden traffic spike, requiring the system to scale horizontally and instantiate new Workers on other servers immediately. This must be done without incurring the latency of a pre-flight "may I send the request" check (like Expect: 100-continue).

Cloudflare achieved graceful, low-latency load shedding by integrating its cross-instance communication tool, Cap'n Proto RPC:

  • Optimistic Sending: The shard client (the server that initially received the request) optimistically sends the complete request to the shard server.
  • Capability Passing: Critically, the client includes a Cap'n Proto capability (a handle to a lazily-loaded local Worker instance) within the request payload.
  • Refusal and Redirect: If the shard server is overloaded, instead of simply returning a "go away" error, it returns the client's own lazy capability.
  • Short-Circuiting the Trombone: The client's RPC system recognizes that the returned capability is local. It immediately stops proxying request bytes to the server, short-circuiting the request path and serving the Worker locally via a rapid cold start (since it now knows to skip the shard server).

The mechanism elegantly offloads traffic and achieves horizontal scaling for burst loads without introducing additional round-trip latency. Furthermore, the technique also extends to complex invocation stacks, where Workers invoke other Workers via Service Bindings, by serializing and passing the entire invocation context stack between shard servers.

About the Author

Steef-Jan Wiggers

Show moreShow less

Rate this Article

Adoption
Style

Related Content

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

BT

AltStyle によって変換されたページ (->オリジナル) /