InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

InfoQ Homepage Apache Spark Content on InfoQ

Presentations

RSS Feed

Newer Older

Posted by Jakub Hava on May 09, 2019

AI, ML & Data Engineering

Productionizing H2O Models with Apache Spark

Jakub Hava demonstrates the creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python.

Jakub Hava
on May 09, 2019

Icon

34:50
Posted by Yuval Degani on Nov 03, 2018

AI, ML & Data Engineering

Accelerated Spark on Azure: Seamless and Scalable Hardware Offloads in the Cloud

Yuval Degani shows how hardware accelerations in Azure can be utilized to speed-up Spark jobs, with the aid of RDMA (Remote Direct Memory Access) support in the VM.

Yuval Degani
on Nov 03, 2018

Icon

38:06
Posted by Tyler Akidau on Feb 17, 2018

AI, ML & Data Engineering

Streaming SQL Foundations: Why I ❤Streams+Tables

Tyler Akidau explores the relationship between the Beam Model and stream & table theory, stream processing in SQL with Apache Beam, Calcite, Flink, Kafka KSQL and Apache Spark’s Structured streaming.

Tyler Akidau
on Feb 17, 2018

Icon

51:39
Posted by Holden Karau on Aug 05, 2017

AI, ML & Data Engineering

Scaling with Apache Spark

Holden Karau looks at Apache Spark from a performance/scaling point of view and what’s needed to handle large datasets.

Holden Karau
on Aug 05, 2017

Icon

46:58
Posted by Elliot Chow on Mar 30, 2017

AI, ML & Data Engineering

Real-Time Recommendations Using Spark Streaming

Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.

Elliot Chow
on Mar 30, 2017

Icon

47:03
Posted by Sameer Farooqui on Aug 23, 2016

AI, ML & Data Engineering

Exploring Wikipedia with Apache Spark: A Live Coding Demo

Sameer Farooqui demos connecting to the live stream of Wikipedia edits, building a dashboard showing what’s happening with Wikipedia datasets and how people are using them in real time.

Sameer Farooqui
on Aug 23, 2016

Icon

59:07
Posted by Andrew Psaltis on Jul 30, 2016

AI, ML & Data Engineering

Apache Beam: The Case for Unifying Streaming APIs

Andrew Psaltis talks about Apache Beam, which aims to provide a unified stream processing model for defining and executing complex data processing, data ingestion and integration workflows.

Andrew Psaltis
on Jul 30, 2016

Icon

33:35
Posted by Mathieu Bastian on Apr 24, 2016

AI, ML & Data Engineering

The Mechanics of Testing Large Data Pipelines

Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.

Mathieu Bastian
on Apr 24, 2016

Icon

36:19
Posted by Helena Edelson on Apr 03, 2016

AI, ML & Data Engineering

Rethinking Streaming Analytics for Scale

Helena Edelson addresses new architectures emerging for large scale streaming analytics based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) or Apache Flink or GearPump.

Helena Edelson
on Apr 03, 2016

Icon

43:44
Posted by Leah McGuire on Jan 16, 2016

AI, ML & Data Engineering

The Lego Model for Machine Learning Pipelines

Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.

Leah McGuire
on Jan 16, 2016

Icon

49:07
Posted by Piotr Kołaczkowski on Jun 17, 2015

Lightning Fast Cluster Computing with Spark and Cassandra

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.

Piotr Kołaczkowski
on Jun 17, 2015

Icon

49:53
Posted by Cosmin Radoi on Jun 10, 2015

Translating Imperative Code to MapReduce

The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.

Cosmin Radoi Stephen J Fink Rodric Rabbah Manu Sridharan
on Jun 10, 2015

Icon

19:02

Newer Presentations

Older Presentations