InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Go Channels: Understanding Happens-Before for Safe Concurrency

This article dives into the happens-before semantics of Go channels, explaining how they relate to memory visibility, synchronization, and concurrency correctness. We'll examine subtle pitfalls, illustrate them with examples, and explore the architectural implications for system designers.

Go Channels: Understanding Happens-Before for Safe Concurrency

All in development

Architecture & Design

Featured in Architecture & Design

Transforming Primary Care: A Case Study in Evolving From Start-Up To Scale-Up

Leander Vanderbijl explains how Kry navigated its scale-up phase to fix a complex, highly dependent "spiderweb" architecture. He shares the journey of applying Domain-Driven Design principles to group functionality and refactor existing services in situ, without stopping development. Key takeaways include moving from product-centric to functionality-centric design and using the FHIR model.

Transforming Primary Care: A Case Study in Evolving From Start-Up To Scale-Up

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Why Observability Matters (More!) with AI Applications

Sally O'Malley explains the unique observability challenges of LLMs and provides a reproducible, open-source stack for monitoring AI workloads. She demonstrates deploying Prometheus, Grafana, OpenTelemetry, and Tempo with vLLM and Llama Stack on Kubernetes. Learn to monitor critical cost, performance, and quality signals for business-critical AI applications.

Why Observability Matters (More!) with AI Applications

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

A Plan-Do-Check-Act Framework for AI Code Generation

AI code generation tools promise faster development but often create quality issues, integration problems, and delivery delays. A structured Plan-Do-Check-Act cycle can maintain code quality while leveraging AI capabilities. Through working agreements, structured prompts, and continuous retrospection, it asserts accountability over code while guiding AI to produce tested, maintainable software.

A Plan-Do-Check-Act Framework for AI Code Generation

All in culture-methods

DevOps

Featured in DevOps

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

Christina Yakomin shares Vanguard's SRE transformation: from quarterly testing of monoliths to a mature DevOps model with continuous delivery. She explains the SRE coaching hub, self-service tools, and advanced techniques like request-rate autoscaling. She details modern challenges, including region failure game days and testing AI-backed contact centers.

From Grassroots to Enterprise: Vanguard's Journey in SRE Transformation

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Get production-proven patterns from the leaders who scaled a GenAI search platform to millions, migrated a core ML system without downtime, and architected a global streaming service from the ground up.

Early Bird ends Nov 11.

QCon AI New York 2025

Move beyond AI demos to real engineering impact. Discover how teams embed LLMs, govern models, and scale inference pipelines to accelerate development securely.

Early Bird ends Nov 11.

QCon London 2026

Benchmark your systems against leading engineering teams. See what really works in FinOps, modern Java, and distributed data architectures to balance cost, scale, and reliability.

Early Bird ends Nov 11.

InfoQ Homepage News Uber Achieves 150M Reads per Second with CacheFront Improvements

Architecture & Design

Uber Achieves 150M Reads per Second with CacheFront Improvements

Oct 06, 2025 2 min read

Leela Kumili

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

Uber engineers have updated the CacheFront architecture to serve over 150 million reads per second while ensuring stronger consistency. The update addresses stale reads in latency-sensitive services and supports growing demand by introducing a new write-through consistency protocol, closer coordination with Docstore, and improvements to Uber’s Flux streaming system.

In the earlier CacheFront design, a high throughput of 40 million reads per second was achieved by deduplicating requests and caching frequently accessed keys close to application services. While effective for scalability, this model lacked robust end-to-end consistency, making it insufficient for workloads requiring the latest data. Cache invalidations relied on time-to-live (TTL) and change data capture (CDC), which introduced eventual consistency and delayed visibility of updates. This also created specific issues: in read-own-writes inconsistency, a row that is read, cached, and then updated might continue serving stale values until invalidated or expired. Negative caching (storing a "not-found" result) could return incorrect misses even after a row was inserted, potentially breaking service logic in read-own-inserts inconsistency.

Previous CacheFront read and write paths for invalidation (Source: Uber Engineering Blog Post)

The new implementation introduces a write-through consistency protocol along with a deduplication layer positioned between the query engine and Flux, Uber’s streaming update system. Each CacheFront node now validates data freshness with Docstore before serving responses. The storage engine layer includes tombstone markers for deleted rows and strictly monotonic timestamps for MySQL session allocation. These mechanisms allow the system to efficiently identify and read back all modified keys, including deletes, just before commit, ensuring that no stale data is served even under high load.

Improved CacheFront write paths & invalidation (Source: Uber Engineering Blog Post)

When a transaction completes, the storage engine returns both the commit timestamp and the set of affected row keys. A callback registered on these responses immediately invalidates any previously cached entries in Redis by writing invalidation markers. Flux continues tailing MySQL binlogs and performing asynchronous cache fills. Together, these three cache population mechanisms: direct query engine updates, invalidation markers, and TTL expirations, combined with Flux tailing, work in concert to maintain strong consistency while supporting extremely high read throughput.

Uber's engineers, Preetham Narayanareddy & Eli Pozniansky, explained the motivation behind the improvement:

There was increasing demand for higher cache hit rates and stronger consistency guarantees. The eventual consistency of using TTL and CDC for cache invalidations became a limiting factor in some use cases.

Uber engineers, through this integration, were also able to deprecate and remove the dedicated API introduced earlier, reducing operational complexity and streamlining the system.

Uber engineers enhanced telemetry and observability dashboards to monitor cache health and real-time binlog tailing. Cache shards were reorganized to distribute load evenly. The Cache Inspector tool, built on the same CDC pipeline as Flux, compares binlog events to entries stored in the cache. These updates allowed TTLs for tables to be extended up to 24 hours, increasing the cache hit rate above 99.9 percent while maintaining low latency.

About the Author

Leela Kumili

Show moreShow less

This content is in the Performance & Scalability topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Uber Achieves 150M Reads per Second with CacheFront Improvements

Write for InfoQ

About the Author

Leela Kumili

Rate this Article

This content is in the Performance & Scalability topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter