InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Developing Meta's Orion AR Glasses

Jinsong Yu shares deep architectural insights into the Orion AR glasses, detailing the use of 11 custom microcontrollers for thermal dissipation, the SLAM/VIO needed for world-locked rendering, and input fusion (EMG, eye/hand tracking). He concludes with critical lessons for technical leaders on setting direction, managing complexity through testing, and strategic hardware-software co-design.

Developing Meta's Orion AR Glasses

All in development

Architecture & Design

Featured in Architecture & Design

Building Resilient Platforms: Insights from Over Twenty Years in Mission-Critical Infrastructure

Building resilient platforms requires understanding the art and science of creating infrastructure that others depend on for critical applications. This perspective applies to anyone who builds software consumed by others at scale. Whether developing infrastructure platforms, software development platforms, or messaging systems, principles address how to build software that others consume at scale

Building Resilient Platforms: Insights from Over Twenty Years in Mission-Critical Infrastructure

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Reducing False Positives in Retrieval-Augmented Generation (RAG) Semantic Caching: A Banking Case Study

In this article, author Elakkiya Daivam discusses why Retrieval Augmented Generation (RAG) and semantic caching techniques are powerful levers for reducing false positives in AI powered applications. She shares the insights from a production-grade evaluation with 1,000 query variations tested across seven bi-encoder models.

Reducing False Positives in Retrieval-Augmented Generation (RAG) Semantic Caching: A Banking Case Study

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

AI Amplifies Team Strengths and Weaknesses in Software Development

In this podcast, Shane Hastie, Lead Editor for Culture & Methods, spoke to Jon Kern and Anita Zbieg about how AI amplifies both delivery efficiency and weaknesses in development teams, the importance of fundamental collaboration practices, and maintaining holistic system thinking.

AI Amplifies Team Strengths and Weaknesses in Software Development

All in culture-methods

DevOps

Featured in DevOps

When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale

Operating massive reverse proxy fleets reveals hard lessons: optimizations that work on smaller systems fail at scale; mundane oversights like missing commas cause major outages; and abstractions meant to simplify become hidden fragility points. Success requires profiling on target hardware, relentlessly monitoring boring details, keeping hot paths lean, and trusting instrumentation over theory.

When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale

All in devops

Events

Helpful links

Choose your language

QCon San Francisco 2025

Learn how senior teams scaled GenAI search, migrated ML systems, & built global streaming platforms.

Register now.

QCon AI New York 2025

Go from AI demos to real engineering impact. Learn to embed LLMs, govern & scale securely.

Early Bird ends Dec 9.

QCon London 2026

Benchmark against top engineering teams. Learn what works in AI, architecture, data, security & FinTech.

Early Bird ends Dec 9.

QCon AI Boston

Learn how leading engineering teams run AI in production—reliably, securely, and at scale.

Launch pricing ends Dec 9.

InfoQ Homepage News Apache Hudi 1.0 Now Generally Available

AI, ML & Data Engineering

Apache Hudi 1.0 Now Generally Available

Jan 18, 2025 2 min read

Renato Losio

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

The Apache Software Foundation has recently announced the general availability of Apache Hudi 1.0, the transactional data lake platform with support for near real-time analytics. Initially introduced in 2017, Apache Hudi provides an open table format optimized for efficient writes in incremental data pipelines and fast query performance.

Originally developed at Uber as an incremental processing framework on Apache Hadoop and submitted to the Apache Software Foundation in 2019, Hudi is designed to bridge the gap between database-like functionality and open data lakehouse architectures. Hudi’s main strength lies in its ability to support both near real-time and batch queries simultaneously.

The latest release introduces new features aimed at transforming data lakehouses into what the project community considers a fully-fledged "Data Lakehouse Management System" (DLMS). Vinoth Chandar, creator of the Hudi Project at Uber and CEO at Onehouse, writes:

Hudi shines by providing a high-performance open table format as well as a comprehensive open-source software stack that can ingest, store, optimize and effectively self-manage a data lakehouse. This distinction between open formats and open software is often lost in translation inside the large vendor ecosystem in which Hudi operates. Still, it has been and remains a key consideration for Hudi’s users to avoid compute-lockin to any given data vendor.

Released under an Apache License 2.0, Hudi 1.0 introduces a new secondary indexing system designed to enhance query performance and reduce data scanning costs. Users can now create SQL-based indexes on secondary columns, significantly speeding up query execution. The release also includes expression-based indexing, similar to a feature in PostgreSQL, which replaces traditional partitioning strategies to enable more flexible and efficient data organization. When the preview was announced last year, Boris Litvak, principal software engineer at Snyk, wrote:

Among the big 3 ACID storage formats on Object Storage, Apache Hudi 1.0 (beta) is the first one introducing "functional indexes" over the data. We usually call it "secondary indexes" in SQL DB jargon. When will Delta.io and Apache Iceberg follow?

Source: Apache Hudi Blog

The release introduces support for partial updates, which improves storage and compute efficiency by allowing updates to specific fields instead of entire rows. Additionally, non-blocking concurrency control enables multiple streaming jobs to write to the same dataset without causing bottlenecks or failures. Discussing the database architecture, Chandar adds:

Regarding full-fledged DLMS functionality, the closest experience Hudi 1.0 offers is through Apache Spark. Users can deploy a Spark server (or Spark Connect) with Hudi 1.0 installed, submit SQL/jobs, orchestrate table services via SQL commands, and enjoy new secondary index functionality to speed up queries like a DBMS.

Hudi 1.0 introduces enhancements to the storage engine, including the adoption of a log-structured merge (LSM) tree for efficient timeline management. This supports long-term data retention and ensures high-performance query planning, even for datasets containing billions of records. Bhavani Sudha Saktheeswaran, software engineer at Onehouse and Apache Hudi PMC, comments:

Whether you're building an open data platform, streaming into the data lakehouse, moving away from data warehouses, or optimizing for high-performance queries, Hudi 1.0.0 makes it easier than ever to work with lakehouses.

Saktheeswaran and Saketh Chintapalli, software engineer at Uber, presented a session on incremental data processing with Apache Hudi at QCon San Francisco. The session recording is available on InfoQ.

About the Author

Renato Losio

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Apache Hudi 1.0 Now Generally Available

Write for InfoQ

About the Author

Renato Losio

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter