|
| 1 | +--- |
| 2 | +title: SparkPipelines |
| 3 | +--- |
| 4 | + |
| 5 | +# SparkPipelines — Spark Pipelines CLI |
| 6 | + |
| 7 | +`SparkPipelines` is a standalone application that can be executed using [spark-pipelines](./index.md#spark-pipelines) shell script. |
| 8 | + |
| 9 | +`SparkPipelines` is a Scala "launchpad" to execute [python/pyspark/pipelines/cli.py](#pysparkpipelinesclipy) Python script (through [SparkSubmit]({{ book.spark_core }}/tools/spark-submit/SparkSubmit/)). |
| 10 | + |
| 11 | +## PySpark Pipelines CLI |
| 12 | + |
| 13 | +=== "uv run" |
| 14 | + |
| 15 | + ```console |
| 16 | + $ pwd |
| 17 | + /Users/jacek/oss/spark/python |
| 18 | + |
| 19 | + $ PYTHONPATH=. uv run \ |
| 20 | + --with grpcio-status \ |
| 21 | + --with grpcio \ |
| 22 | + --with pyarrow \ |
| 23 | + --with pandas \ |
| 24 | + --with pyspark \ |
| 25 | + python pyspark/pipelines/cli.py |
| 26 | + ... |
| 27 | + usage: cli.py [-h] {run,dry-run,init} ... |
| 28 | + cli.py: error: the following arguments are required: command |
| 29 | + ``` |
| 30 | + |
| 31 | +### dry-run |
| 32 | + |
| 33 | +Launch a run that just validates the graph and checks for errors |
| 34 | + |
| 35 | +Option | Description | Default |
| 36 | +-|-|- |
| 37 | + `--spec` | Path to the pipeline spec | (undefined) |
| 38 | + |
| 39 | +### init |
| 40 | + |
| 41 | +Generate a sample pipeline project, including a spec file and example definitions |
| 42 | + |
| 43 | +Option | Description | Default | Required |
| 44 | +-|-|-|:-: |
| 45 | + `--name` | Name of the project. A directory with this name will be created underneath the current directory | (undefined) | ✅ |
| 46 | + |
| 47 | +```console |
| 48 | +$ ./bin/spark-pipelines init --name hello-pipelines |
| 49 | +Pipeline project 'hello-pipelines' created successfully. To run your pipeline: |
| 50 | +cd 'hello-pipelines' |
| 51 | +spark-pipelines run |
| 52 | +``` |
| 53 | + |
| 54 | +### run |
| 55 | + |
| 56 | +Run a pipeline. If no `--refresh` option specified, a default incremental update is performed. |
| 57 | + |
| 58 | +Option | Description | Default |
| 59 | +-|-|- |
| 60 | + `--spec` | Path to the pipeline spec | (undefined) |
| 61 | + `--full-refresh` | List of datasets to reset and recompute (comma-separated) | (empty) |
| 62 | + `--full-refresh-all` | Perform a full graph reset and recompute | (undefined) |
| 63 | + `--refresh` | List of datasets to update (comma-separated) | (empty) |
0 commit comments