InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Don't have an InfoQ account?

Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

Developing Meta's Orion AR Glasses

Jinsong Yu shares deep architectural insights into the Orion AR glasses, detailing the use of 11 custom microcontrollers for thermal dissipation, the SLAM/VIO needed for world-locked rendering, and input fusion (EMG, eye/hand tracking). He concludes with critical lessons for technical leaders on setting direction, managing complexity through testing, and strategic hardware-software co-design.

Developing Meta's Orion AR Glasses

All in development

Architecture & Design

Featured in Architecture & Design

From Dashboard Soup to Observability Lasagna: Building Better Layers

Martha Lambert introduces the "Observability Lasagna" - a four-layer framework (Overview, System, Logs, Traces) focused on connecting layers for an optimized debugging UX. Learn practical tips for instrumentation, visualizing limits, and using event logs/exemplars to shift from general metrics to user-impact focused triaging. Essential for engineering leaders aiming for system reliability.

From Dashboard Soup to Observability Lasagna: Building Better Layers

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

AI-Driven Software Delivery: Leveraging Lean, ChOP & LLMs to Create More Effective Learning Experiences at QCon

Wes Reisz discusses an experiment to deliver a QCon certification using a Retrieval-Augmented Generation (RAG) architecture and supervised coding agents (Claude Sonnet/Cursor). He breaks down the 4-week serverless video transcription pipeline, RAG variations (hybrid, graph), and the process of structuring prompts for 95% AI-generated code.

AI-Driven Software Delivery: Leveraging Lean, ChOP & LLMs to Create More Effective Learning Experiences at QCon

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

The Architecture of Developer Experience: Where Product, Platform, and Operations Meet

The panelists discuss designing platform architecture where product, platform, and operations meet. Experts share best practices for reducing cognitive load, balancing core ops vs. innovation, measuring success (lead time, cost avoidance), and enabling developers through self-service and golden path deviations.

The Architecture of Developer Experience: Where Product, Platform, and Operations Meet

All in culture-methods

DevOps

Featured in DevOps

Trust No One: Securing the Modern Software Supply Chain with Zero Trust

Emma Yuan Fang explains the Zero Trust mindset required to combat modern software supply chain attacks. She details security controls for dependency management, including SBOM (Software Bill of Materials), artifact signing, Git commit signing, and CI/CD hardening. Learn how to implement security gating, enforce policies as code, and manage secrets across your build and runtime environments.

Trust No One: Securing the Modern Software Supply Chain with Zero Trust

All in devops

Events

Helpful links

Choose your language

QCon AI New York 2025

Go from AI demos to real engineering impact. Learn to embed LLMs, govern & scale securely.

Early Bird ends Dec 9.

QCon London 2026

Learn what works in AI, architecture, data, security & FinTech.

Early Bird ends Dec 9.

QCon AI Boston

Learn how leading engineering teams run AI in production—reliably, securely, and at scale.

Launch pricing ends Dec 9.

InfoQ Homepage News How Netflix is Reimagining Data Engineering for Video, Audio, and Text

AI, ML & Data Engineering

How Netflix is Reimagining Data Engineering for Video, Audio, and Text

Aug 25, 2025 2 min read

Matt Foster

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Audio ready to play

0:00

Reading list

Netflix has introduced a new engineering specialization—Media ML Data Engineering, alongside a Media Data Lake designed to handle video, audio, text, and image assets at scale. Early results include richer ML models trained on standardized media, faster evaluation cycles, and deeper insights into creative workflows.

In a recent blog post, the company described how this evolution moves its data engineering function beyond "facts and metrics" tables toward supporting machine learning directly on media content.

By formalizing the role and platform, Netflix aims to provide standardized, ML-ready datasets and enable faster experimentation in areas such as localization, media restoration, ratings, and multimodal search.

Netflix's data engineering team once focused on structured tables for metrics, dashboards, and models. As studio operations expanded, however, they faced a flood of multi-modal, unstructured media — video, audio, images, and text — at massive scale.

These assets, tied to creative workflows and lineage, introduced complexity that traditional pipelines couldn’t manage, prompting the need for a new approach.

To meet this challenge, Netflix created Media ML Data Engineering, a specialization at the intersection of data engineering, ML infrastructure, and media production. These engineers build and maintain pipelines for the Media Data Lake, standardize assets, enrich metadata, and expose ML-ready corpora for research and production.

Collaboration is central: they work with domain experts, researchers, and platform teams to ensure solutions meet both technical and creative needs.

(The Media ML Data Engineer)

The Media Data Lake is designed specifically for storing and serving media assets and their metadata. The lake is powered by LanceDB and integrates into Netflix's big data ecosystem.

At its core is the Media Table, a structured dataset that captures metadata and references to all media assets, and can also store ML outputs like embeddings. Netflix notes that by combining metadata with outputs such as embeddings, the Media Table enables complex vector queries and experimentation with multimodal search.

Supporting components include a standardized data model, a pythonic Data API, UI tools for exploration, and systems for both real-time queries and large-scale batch processing. Together, these enable media assets to be searched, explored, and prepared for ML training at scale.

(Media Table)

These tables already power several applications, including translation and audio quality metrics using TTS models, HDR video restoration, compliance checks for smoking or gore, and multimodal search across frames, shots, and dialogue.

Netflix positions these examples as evidence that media tables are not just a storage layer, but a driver of new creative and operational workflows.

Before reaching these use cases, Netflix began with a scoped "data pond" focused on video and audio from its internal asset management system and annotation store. The company reports that this limited rollout allowed them to de-risk the introduction of new technology and ensure a solid, extensible foundation before scaling further.

Looking ahead, Netflix highlights benefits already emerging: richer and more accurate ML models trained on standardized media, faster evaluation cycles, quicker productization of new AI features, and deeper insights into creative workflows.

The company plans to expand the Media Data Lake further and share future learnings with the wider data engineering community.

About the Author

Matt Foster

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

How Netflix is Reimagining Data Engineering for Video, Audio, and Text

Write for InfoQ

About the Author

Matt Foster

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

Related Content

The InfoQ Newsletter