InfoQ Homepage Data Lake Content on InfoQ
Articles
RSS Feed-
Building Reproducible ML Systems with Apache Iceberg and SparkSQL: Open Source Foundations
Traditional data lakes are great for storing massive amounts of stuff, but they're terrible at the transactional guarantees and versioning that ML workloads desperately need. Apache Iceberg and SparkSQL bring database-like reliability to your data lake. Time travel, schema evolution, and ACID transactions help support reproducible machine learning experiments.
on Jul 31, 2025 -
The End of the Bronze Age: Rethinking the Medallion Architecture
A shift left approach to data processing relies on data products that form the basis of data communication across the business. This addresses many flaws in traditional data processing and makes data more relevant, complete, and trustworthy.
on Jan 29, 2025 -
Data Leadership Book Review and Interview
Data Leadership book, authored by Anthony Algmin, covers the data leadership topic and how data leaders should manage and govern the data management programs in their organizations. Data Leadership is how organizations choose to apply their energy and resources toward creating data capabilities to influence their business.
on Jul 25, 2020 -
Data Lake-as-a-Service: Big Data Processing and Analytics in the Cloud
Data Lake-as-a-Service solutions provide big data processing in the cloud for faster business outcomes in a very cost effective way. InfoQ spoke with Lovan Chetty and Hannah Smalltree from Cazena team about how Data Lake as a Service works.
on Dec 10, 2015