[フレーム]
BT

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

We protect your privacy.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In
or

Don't have an InfoQ account?

Register
  • Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
  • Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
  • Save articles and read at anytimeBookmark articles to read whenever youre ready.

Topics

Choose your language

InfoQ Homepage News Meta Open Sources OpenZL: a Universal Compression Framework for Structured Data

Meta Open Sources OpenZL: a Universal Compression Framework for Structured Data

Oct 28, 2025 3 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.
Get in touch
Listen to this article - 0:00
Audio ready to play
0:00
0:00

Meta recently open-sourced OpenZL, a new data compression framework for highly structured data that explicitly models schemas to achieve a better compression ratio and faster speeds than general-purpose tools like Zstandard (Zstd). The framework maintains operational simplicity via a universal decompressor that executes an embedded Plan, removing the need for external metadata and enabling fleet-wide updates from a single binary.

When Meta introduced Zstd nearly a decade ago, it became a cornerstone of large-scale data infrastructure thanks to its speed and efficiency. However, as workloads evolved, particularly those involving structured formats such as Protocol Buffers, database tables, and ML tensors, Meta engineers found that generic compression methods were leaving untapped potential for gains. Traditional compressors treat data as raw byte streams, failing to leverage the inherent structure and patterns in modern datasets.

OpenZL takes a different approach by explicitly modeling data structures such as columnar layouts, enumerations, and repetitive patterns, rather than treating everything as an undifferentiated "byte soup." This structured compression allows OpenZL to outperform general-purpose tools like standard Zstd on both compression ratio and speed for relevant datasets. Instead of guessing optimal techniques, OpenZL applies a configurable sequence of reversible transforms to expose latent order in the data before the final entropy-coding stage.

A diagram of a diagramAI-generated content may be incorrect.

(Source: Meta blog post)

A key operational advantage of OpenZL is its universal decompressor. Compression Plans are generated offline by a component called the trainer, which analyzes the provided data schema and produces an optimized Plan. During encoding, this plan is converted into a concrete decode recipe and embedded directly within the compressed frame.

This model means that:

  • Every OpenZL file, regardless of its custom transform sequence, can be decompressed using the same binary.
  • The decoder requires no external metadata—it executes the embedded recipe.
  • Retraining or re-optimizing compression plans can improve performance without altering the universal decoder, ensuring backward compatibility.

Meta engineers emphasize that this operational simplicity is critical for data center deployments: one audited decompression surface, fleet-wide updates from a single binary, and clear version control across large infrastructures.

In internal benchmarks on structured datasets (e.g., Silesia Compression Corpus’s "sao star" records), OpenZL showed substantial gains. By parsing structured records, splitting them into fields (Structure of Arrays), and applying domain-aware transforms such as delta encoding, it achieved:

Compressor

Compressed Size

Compression Ratio

Compression Speed

Decompression Speed

zstd -3

5,531,935 B

$\times 1.31$

220 MB/s

850 MB/s

xz -9

3,516,649 B

$\times 2.06$

3.5 MB/s

45 MB/s

OpenZL

4,414,351 B

$\times 1.64$

340 MB/s

1200 MB/s

Crucially, OpenZL demonstrated a better compression ratio while preserving or improving both compression and decompression speeds compared to zstd -3.

Users can describe their data structure using the Simple Data Description Language (SDDL) or a custom parser function. The offline trainer then uses a budgeted search over transform choices to generate an optimal compression Plan.

Unlike some experimental formats that embed general-purpose code, such as WebAssembly for decompression, OpenZL’s approach limits execution to a deterministic graph. This ensures reproducible decoding, a key property for long-term data archival. As one Hacker News correspondent noted, while sandboxing WebAssembly is easy:

The real problem is determinism—function calls made to those WebAssembly modules may still be nondeterministic!

By contrast, OpenZL’s fixed execution graph guarantees deterministic decompression behavior.

OpenZL performs best on structured data, such as time-series datasets, ML tensors, and database tables. Where structure is minimal (e.g., pure text), OpenZL intelligently falls back to using Zstd. Abelardo Fukasawa, a researcher at Quantls Infinity, reinforced this point, stating:

Instead of treating every format the same (as gzip or standard compressors do), it adapts its compression to the specific structure of the data using SDDL—often yielding better ratios and throughput on structured workloads.

The framework is publicly available on GitHub for developers to experiment with and contribute to.

About the Author

Steef-Jan Wiggers

Show moreShow less

Rate this Article

Adoption
Style

Related Content

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

BT

AltStyle によって変換されたページ (->オリジナル) /