Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Simplifying robust end-to-end machine learning on Apache Spark.

License

Notifications You must be signed in to change notification settings

github-allen/keystone

Repository files navigation

KeystoneML

The biggest, baddest pipelines around.

Example pipeline

Build the KeystoneML project

./sbt/sbt assembly
make # This builds the native libraries used in KeystoneML

Example: MNIST pipeline

# Get the data from S3
wget http://mnist-data.s3.amazonaws.com/train-mnist-dense-with-labels.data
wget http://mnist-data.s3.amazonaws.com/test-mnist-dense-with-labels.data
KEYSTONE_MEM=4g ./bin/run-pipeline.sh \
 keystoneml.pipelines.images.mnist.MnistRandomFFT \
 --trainLocation ./train-mnist-dense-with-labels.data \
 --testLocation ./test-mnist-dense-with-labels.data \
 --numFFTs 4 \
 --blockSize 2048

Running with spark-submit

To run KeystoneML pipelines on large datasets you will need a Spark cluster. KeystoneML pipelines run on the cluster using spark-submit.

You need to export SPARK_HOME to run KeystoneML using spark-submit. Having done that you can similarly use run-pipeline.sh to launch your pipeline.

export SPARK_HOME=~/spark-1.3.1-bin-cdh4 # should match the version keystone is built with
KEYSTONE_MEM=4g ./bin/run-pipeline.sh \
 keystoneml.pipelines.images.mnist.MnistRandomFFT \
 --trainLocation ./train-mnist-dense-with-labels.data \
 --testLocation ./test-mnist-dense-with-labels.data \
 --numFFTs 4 \
 --blockSize 2048

About

Simplifying robust end-to-end machine learning on Apache Spark.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 96.2%
  • C++ 2.2%
  • Other 1.6%

AltStyle によって変換されたページ (->オリジナル) /