InfoQ Homepage Apache Spark Content on InfoQ
-
Posted by
Jakub Hava
on
May 09, 2019
Productionizing H2O Models with Apache Spark
Jakub Hava demonstrates the creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python.
on May 09, 2019Icon34:50 -
Posted by
Yuval Degani
on
Nov 03, 2018
Accelerated Spark on Azure: Seamless and Scalable Hardware Offloads in the Cloud
Yuval Degani shows how hardware accelerations in Azure can be utilized to speed-up Spark jobs, with the aid of RDMA (Remote Direct Memory Access) support in the VM.
on Nov 03, 2018Icon38:06 -
Posted by
Tyler Akidau
on
Feb 17, 2018
Streaming SQL Foundations: Why I ❤Streams+Tables
Tyler Akidau explores the relationship between the Beam Model and stream & table theory, stream processing in SQL with Apache Beam, Calcite, Flink, Kafka KSQL and Apache Spark’s Structured streaming.
on Feb 17, 2018Icon51:39 -
Posted by
Holden Karau
on
Aug 05, 2017
Scaling with Apache Spark
Holden Karau looks at Apache Spark from a performance/scaling point of view and what’s needed to handle large datasets.
on Aug 05, 2017Icon46:58 -
Posted by
Elliot Chow
on
Mar 30, 2017
Real-Time Recommendations Using Spark Streaming
Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.
on Mar 30, 2017Icon47:03 -
Posted by
Sameer Farooqui
on
Aug 23, 2016
Exploring Wikipedia with Apache Spark: A Live Coding Demo
Sameer Farooqui demos connecting to the live stream of Wikipedia edits, building a dashboard showing what’s happening with Wikipedia datasets and how people are using them in real time.
on Aug 23, 2016Icon59:07 -
Posted by
Andrew Psaltis
on
Jul 30, 2016
Apache Beam: The Case for Unifying Streaming APIs
Andrew Psaltis talks about Apache Beam, which aims to provide a unified stream processing model for defining and executing complex data processing, data ingestion and integration workflows.
on Jul 30, 2016Icon33:35 -
Posted by
Mathieu Bastian
on
Apr 24, 2016
The Mechanics of Testing Large Data Pipelines
Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.
on Apr 24, 2016Icon36:19 -
Posted by
Helena Edelson
on
Apr 03, 2016
Rethinking Streaming Analytics for Scale
Helena Edelson addresses new architectures emerging for large scale streaming analytics based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) or Apache Flink or GearPump.
on Apr 03, 2016Icon43:44 -
Posted by
Leah McGuire
on
Jan 16, 2016
The Lego Model for Machine Learning Pipelines
Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.
on Jan 16, 2016Icon49:07 -
Posted by
Piotr Kołaczkowski
on
Jun 17, 2015
Lightning Fast Cluster Computing with Spark and Cassandra
Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.
on Jun 17, 2015Icon49:53 -
Posted by
Cosmin Radoi
on
Jun 10, 2015
Translating Imperative Code to MapReduce
The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.
on Jun 10, 2015Icon19:02