InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Go Channels: Understanding Happens-Before for Safe Concurrency

This article dives into the happens-before semantics of Go channels, explaining how they relate to memory visibility, synchronization, and concurrency correctness. We'll examine subtle pitfalls, illustrate them with examples, and explore the architectural implications for system designers.

Go Channels: Understanding Happens-Before for Safe Concurrency

All in development

Architecture & Design

Featured in Architecture & Design

Transforming Primary Care: A Case Study in Evolving From Start-Up To Scale-Up

Leander Vanderbijl explains how Kry navigated its scale-up phase to fix a complex, highly dependent "spiderweb" architecture. He shares the journey of applying Domain-Driven Design principles to group functionality and refactor existing services in situ, without stopping development. Key takeaways include moving from product-centric to functionality-centric design and using the FHIR model.

Transforming Primary Care: A Case Study in Evolving From Start-Up To Scale-Up

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Why Observability Matters (More!) with AI Applications

Sally O'Malley explains the unique observability challenges of LLMs and provides a reproducible, open-source stack for monitoring AI workloads. She demonstrates deploying Prometheus, Grafana, OpenTelemetry, and Tempo with vLLM and Llama Stack on Kubernetes. Learn to monitor critical cost, performance, and quality signals for business-critical AI applications.

Why Observability Matters (More!) with AI Applications

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

A Plan-Do-Check-Act Framework for AI Code Generation

AI code generation tools promise faster development but often create quality issues, integration problems, and delivery delays. A structured Plan-Do-Check-Act cycle can maintain code quality while leveraging AI capabilities. Through working agreements, structured prompts, and continuous retrospection, it asserts accountability over code while guiding AI to produce tested, maintainable software.

A Plan-Do-Check-Act Framework for AI Code Generation

All in culture-methods

DevOps

Featured in DevOps

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

Christina Yakomin shares Vanguard's SRE transformation: from quarterly testing of monoliths to a mature DevOps model with continuous delivery. She explains the SRE coaching hub, self-service tools, and advanced techniques like request-rate autoscaling. She details modern challenges, including region failure game days and testing AI-backed contact centers.

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Get production-proven patterns from the leaders who scaled a GenAI search platform to millions, migrated a core ML system without downtime, and architected a global streaming service from the ground up.

Early Bird ends Nov 11.

QCon AI New York 2025

Move beyond AI demos to real engineering impact. Discover how teams embed LLMs, govern models, and scale inference pipelines to accelerate development securely.

Early Bird ends Nov 11.

QCon London 2026

Benchmark your systems against leading engineering teams. See what really works in FinOps, modern Java, and distributed data architectures to balance cost, scale, and reliability.

Early Bird ends Nov 11.

InfoQ Homepage News Cloudflare Achieves 99.99% Warm Start Rate for Workers with 'Shard and Conquer' Consistent Hashing

Cloud

Cloudflare Achieves 99.99% Warm Start Rate for Workers with 'Shard and Conquer' Consistent Hashing

Oct 10, 2025 3 min read

Steef-Jan Wiggers

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

Recently, Cloudflare introduced a new technique called "Shard and Conquer" for reducing cold starts in its serverless platform, Cloudflare Workers. It leverages a consistent hash ring to intentionally coalesce traffic for individual Workers onto a single "shard server" within a data center. With this new technique, the company has reduced the cold start rate by a factor of 10, achieving a sustained warm request rate for 99.99% of requests.

The technique marks the second significant evolution in cold start mitigation for the company, as the initial technique, pre-warming during the TLS handshake, began to fail for increasingly complex applications. Hence, Cloudflare decided to relax several platform limits in response to user demand for running larger, more complex applications by increasing script size from 1MB to 10MB (for paying users) and startup CPU time from 200ms to 400ms.

Yet by accommodating richer applications, these increases simultaneously lengthened the Worker's cold start duration, causing it to frequently exceed the time of a modern TLS 1.3 handshake. This meant the cold start time could no longer be hidden entirely from the end user, requiring a new approach to minimize the frequency of cold starts.

The core motivation for this optimization lies in the serverless value proposition, as one Hacker News commentator observed:

Because attractiveness of Workers/Lambdas/Functions is whole 'write simple amount of code and pay pennies to run it.' Downside is cold starts, twisting yourself into knots you will do at scale to make them work, and vendor lock-in.

To solve the cold start frequency problem, Cloudflare borrowed a key technique from its own CDN HTTP cache: consistent hashing.

Previously, a request arriving on any server could trigger a redundant cold start, even if a warm Worker instance already existed on a nearby machine. This resulted in high cold start rates for low-volume Workers, whose instances were frequently evicted across multiple servers due to low, scattered traffic.

The new architecture works as follows:

A Worker's script ID is mapped onto a consistent hash ring shared by all servers in a data center.
Subsequently, this map dictates a single, primary "shard server" that is responsible for running a specific Worker instance.
As a result, all requests for that Worker are routed to the shard server, keeping the Worker instance warm indefinitely and reducing memory usage across the cluster by avoiding redundant instances.

A diagram of a pie chartAI-generated content may be incorrect.

(Source: Cloudflare blog post)

A crucial engineering challenge for this sharding model is load shedding. An individual Worker instance could still be overwhelmed by a sudden traffic spike, requiring the system to scale horizontally and instantiate new Workers on other servers immediately. This must be done without incurring the latency of a pre-flight "may I send the request" check (like Expect: 100-continue).

Cloudflare achieved graceful, low-latency load shedding by integrating its cross-instance communication tool, Cap'n Proto RPC:

Optimistic Sending: The shard client (the server that initially received the request) optimistically sends the complete request to the shard server.
Capability Passing: Critically, the client includes a Cap'n Proto capability (a handle to a lazily-loaded local Worker instance) within the request payload.
Refusal and Redirect: If the shard server is overloaded, instead of simply returning a "go away" error, it returns the client's own lazy capability.
Short-Circuiting the Trombone: The client's RPC system recognizes that the returned capability is local. It immediately stops proxying request bytes to the server, short-circuiting the request path and serving the Worker locally via a rapid cold start (since it now knows to skip the shard server).

The mechanism elegantly offloads traffic and achieves horizontal scaling for burst loads without introducing additional round-trip latency. Furthermore, the technique also extends to complex invocation stacks, where Workers invoke other Workers via Service Bindings, by serializing and passing the entire invocation context stack between shard servers.

About the Author

Steef-Jan Wiggers

Show moreShow less

This content is in the Cloud topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Cloudflare Achieves 99.99% Warm Start Rate for Workers with 'Shard and Conquer' Consistent Hashing

Write for InfoQ

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter