InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Go Channels: Understanding Happens-Before for Safe Concurrency

This article dives into the happens-before semantics of Go channels, explaining how they relate to memory visibility, synchronization, and concurrency correctness. We'll examine subtle pitfalls, illustrate them with examples, and explore the architectural implications for system designers.

Go Channels: Understanding Happens-Before for Safe Concurrency

All in development

Architecture & Design

Featured in Architecture & Design

Scaling API Independence: Mocking, Contract Testing & Observability in Large Microservices Environments

Tom Akehurst explains strategies for overcoming microservice pain points like environment dependency and slow development. He advocates using realistic API simulation at scale, supported by contract testing , API observability, and GenAI integration. Learn to compose observations, simulations, and contracts to maximize confidence and reduce the toil of maintaining realistic, up-to-date mocks.

Scaling API Independence: Mocking, Contract Testing & Observability in Large Microservices Environments

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Achieving Precision in AI: Retrieving the Right Data Using AI Agents

Adi Polak explains the path from GenAI prototype to production by focusing on precision - the competitive edge. She details Agentic RAG architectures, emergent agent design patterns, and crucial feedback loops (LLM-as-a-judge) for refinement. Learn how to leverage data streaming (Kafka) to manage collaboration, memory, and scale microservices in real-time agent systems.

Achieving Precision in AI: Retrieving the Right Data Using AI Agents

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

Shine Bright as an IC: Growing Yourself as Your Company Grows

Suhail Patel discusses how senior engineers and tech leaders must go beyond technical mastery to achieve staff-plus growth. He explains how to leverage one-to-ones, intentional interviewing (as learning opportunities), and visible writing to build influence and your network. Get practical advice on making ambitious bets and fixing organizational cracks to grow your team and company.

Shine Bright as an IC: Growing Yourself as Your Company Grows

All in culture-methods

DevOps

Featured in DevOps

You Are Asking the Wrong Questions (About Reliability and SRE)

David Blank-Edelman (Microsoft SRE Academy) explains 7 essential questions to elevate your reliability practice. He challenges engineering leaders to redefine reliability metrics beyond availability, replace "root cause" with contributing factors, critique the 5 whys, re-evaluate the true goals of toil automation, and understand SRE's role (firefighting vs. partnership).

You Are Asking the Wrong Questions (About Reliability and SRE)

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Get production-proven patterns from the leaders who scaled a GenAI search platform to millions, migrated a core ML system without downtime, and architected a global streaming service from the ground up.

Early Bird ends Nov 11.

QCon AI New York 2025

Move beyond AI demos to real engineering impact. Discover how teams embed LLMs, govern models, and scale inference pipelines to accelerate development securely.

Early Bird ends Nov 11.

QCon London 2026

Benchmark your systems against leading engineering teams. See what really works in FinOps, modern Java, and distributed data architectures to balance cost, scale, and reliability.

Early Bird ends Nov 11.

InfoQ Homepage News Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability

Architecture & Design

Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability

Nov 06, 2025 2 min read

Leela Kumili

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

Uber has redesigned its Apache Pinot query architecture to simplify execution, support richer SQL, and improve predictability for internal analytics workloads. The previous Neutrino system, which layered Presto and Pinot, has been replaced by a lightweight proxy called Cellar and uses Pinot’s Multi-Stage Engine Lite Mode. The redesign aims to reduce complexity, enforce execution limits, and provide stronger isolation for multiple tenants.

Previously, Neutrino ran as a stateless microservice combining Presto coordinator and worker processes. User-submitted PrestoSQL queries were partially pushed down to Pinot as PinotSQL, while the remaining query logic executed within Neutrino. Each query included default or user-defined limits to reduce the risk of full-table scans. Despite these safeguards, the layered architecture created complex semantics, made query plans harder to interpret, and limited isolation for tenants sharing the same proxy.

Uber's Neutrino’s query architecture(Source: Uber Blog Post)

Uber’s Apache Pinot tables can reach hundreds of terabytes with billions of records, handling query rates from single digits to thousands of QPS. Multi-stage queries at this scale can easily exceed resources or latency expectations. Pinot 1.4 introduces the Multi-Stage Engine Lite Mode, which enforces configurable leaf stage record limits and uses a scatter-gather pattern. Leaf stages run on Pinot servers while other operators execute on brokers, ensuring predictable performance for complex queries.

The new architecture introduces Cellar, a lightweight proxy that forwards queries directly to Pinot brokers. For basic workloads, Pinot's single-stage query engine handles execution, and for advanced SQL features, Uber uses the Multi-Stage Engine in Lite Mode. MSE Lite Mode enforces configurable maximum record limits at the leaf stage to prevent excessive resource usage and surfaces these limits in the explain plan for transparency. Scatter-gather execution remains, with leaf stages on data nodes and aggregation on brokers, while supporting joins and window functions under controlled conditions. Uber also added monitoring and logging enhancements to MSE Lite Mode, enabling engineering teams to track query performance and troubleshoot high-latency requests more efficiently.

High-level Cellar query architecture (Source: Uber Blog Post)

Cellar also includes a direct-connection mode that allows tenants to bypass the proxy and connect directly to Pinot brokers. Uber has also integrated a time series plugin supporting M3QL through Cellar. The rebuilt architecture powers internal analytics workloads such as tracing, log search, and segmentation. As of publication, Cellar handles roughly 20% of Neutrino's prior query volume, with plans to expand adoption and phase out Neutrino.

Cellar direct mode connection for complete isolation (Source: Uber Blog Post)

Uber also provides official client libraries for Java and Go monorepos to simplify interaction with Cellar. The clients handle Pinot’s response format, support partial results with warnings, enforce timeouts and retries, and emit metrics for latency, query success, and warnings. A Grafana dashboard provides operational visibility for new users out of the box.

According to Uber’s engineering team, the redesign reflects the evolution of OLAP systems to support high QPS and sub second latencies while maintaining isolation and predictability. They plan to release MSE Lite Mode to users later this year and improve it further.

About the Author

Leela Kumili

Show moreShow less

This content is in the Multi-tenancy topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability

Write for InfoQ

About the Author

Leela Kumili

Rate this Article

This content is in the Multi-tenancy topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter