Last Updated: February 25, 2016

Using Apache Spark from Clojure

Here's a small sample of how to process big data (with a small sample) from Clojure using Apache Spark and the Sparkling library:

(do
 (require '[sparkling.conf :as conf])
 (require '[sparkling.core :as spark])
 (spark/with-context ; this creates a spark context from the given config
 sc
 (-> (conf/spark-conf)
 (conf/app-name "sparkling-test")
 (conf/master "local"))
 (let [lines-rdd
 ;; here we provide data from a clojure collection.
 ;; You could also read from a text file, or avro file.
 ;; You could even approach a JDBC datasource
 (spark/into-rdd sc ["This is a first line"
 "Testing spark"
 "and sparkling"
 "Happy hacking!"])]
 (spark/collect ; get every element from the filtered RDD
 (spark/filter ; filter elements in the given RDD (lines-rdd)
 #(.contains % "spark") ; a pure clojure function as filter predicate
 lines-rdd)))))

#clojure

#bigdata

#spark