The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
-
Updated
Oct 17, 2025 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
Simple and Distributed Machine Learning
lakeFS - Data version control for your data lake | Git for data
酷玩 Spark: Spark 源代码解析、Spark 类库等
Interactive and Reactive Data Science using Scala and Spark.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
.NET for Apache® SparkTM makes Apache SparkTM easily accessible to .NET developers.
Apache Spark docker image
Feathr – A scalable, unified data and AI engineering platform for enterprise
A curated list of awesome Apache Spark packages and resources.
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
PySpark + Scikit-learn = Sparkit-learn
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
MapReduce, Spark, Java, and Scala for Data Algorithms Book
(Deprecated) Scikit-learn integration package for Apache Spark
Created by Matei Zaharia
Released May 26, 2014