Commit 3840db9

committed

[SDP] SparkPipelines — Spark Pipelines CLI

1 parent ea323d5 commit 3840db9Copy full SHA for 3840db9

File tree

2 files changed

+85

-0

lines changed

docs/declarative-pipelines
- SparkPipelines.md
- index.md

2 files changed

+85

-0

lines changed

`‎docs/declarative-pipelines/SparkPipelines.md`

Lines changed: 63 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,63 @@`
	`1`	`+---`
	`2`	`+title: SparkPipelines`
	`3`	`+---`
	`4`	`+`
	`5`	`+# SparkPipelines — Spark Pipelines CLI`
	`6`	`+`
	`7`	+`SparkPipelines` is a standalone application that can be executed using [spark-pipelines](./index.md#spark-pipelines) shell script.
	`8`	`+`
	`9`	+`SparkPipelines` is a Scala "launchpad" to execute [python/pyspark/pipelines/cli.py](#pysparkpipelinesclipy) Python script (through [SparkSubmit]({{ book.spark_core }}/tools/spark-submit/SparkSubmit/)).
	`10`	`+`
	`11`	`+## PySpark Pipelines CLI`
	`12`	`+`
	`13`	`+=== "uv run"`
	`14`	`+`
	`15`	+ ```console
	`16`	`+ $ pwd`
	`17`	`+ /Users/jacek/oss/spark/python`
	`18`	`+`
	`19`	`+ $ PYTHONPATH=. uv run \`
	`20`	`+ --with grpcio-status \`
	`21`	`+ --with grpcio \`
	`22`	`+ --with pyarrow \`
	`23`	`+ --with pandas \`
	`24`	`+ --with pyspark \`
	`25`	`+ python pyspark/pipelines/cli.py`
	`26`	`+ ...`
	`27`	`+ usage: cli.py [-h] {run,dry-run,init} ...`
	`28`	`+ cli.py: error: the following arguments are required: command`
	`29`	+ ```
	`30`	`+`
	`31`	`+### dry-run`
	`32`	`+`
	`33`	`+Launch a run that just validates the graph and checks for errors`
	`34`	`+`
	`35`	`+Option \| Description \| Default`
	`36`	`+-\|-\|-`
	`37`	+ `--spec` \| Path to the pipeline spec \| (undefined)
	`38`	`+`
	`39`	`+### init`
	`40`	`+`
	`41`	`+Generate a sample pipeline project, including a spec file and example definitions`
	`42`	`+`
	`43`	`+Option \| Description \| Default \| Required`
	`44`	`+-\|-\|-\|:-:`
	`45`	+ `--name` \| Name of the project. A directory with this name will be created underneath the current directory \| (undefined) \| ✅
	`46`	`+`
	`47`	+```console
	`48`	`+$ ./bin/spark-pipelines init --name hello-pipelines`
	`49`	`+Pipeline project 'hello-pipelines' created successfully. To run your pipeline:`
	`50`	`+cd 'hello-pipelines'`
	`51`	`+spark-pipelines run`
	`52`	+```
	`53`	`+`
	`54`	`+### run`
	`55`	`+`
	`56`	+Run a pipeline. If no `--refresh` option specified, a default incremental update is performed.
	`57`	`+`
	`58`	`+Option \| Description \| Default`
	`59`	`+-\|-\|-`
	`60`	+ `--spec` \| Path to the pipeline spec \| (undefined)
	`61`	+ `--full-refresh` \| List of datasets to reset and recompute (comma-separated) \| (empty)
	`62`	+ `--full-refresh-all` \| Perform a full graph reset and recompute \| (undefined)
	`63`	+ `--refresh` \| List of datasets to update (comma-separated) \| (empty)

`‎docs/declarative-pipelines/index.md`

Lines changed: 22 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -22,6 +22,28 @@ Declarative Pipelines uses the following [Python decorators](https://peps.python`
`22`	`22`
`23`	`23`	`Once described, a pipeline can be [started](PipelineExecution.md#runPipeline) (on a [PipelineExecution](PipelineExecution.md)).`
`24`	`24`
	`25`	`+## Spark Connect Only`
	`26`	`+`
	`27`	`+Declarative Pipelines currently only supports Spark Connect.`
	`28`	`+`
	`29`	+```console
	`30`	`+$ ./bin/spark-pipelines --conf spark.api.mode=xxx`
	`31`	`+...`
	`32`	`+25/08/03 12:33:57 INFO SparkPipelines: --spark.api.mode must be 'connect'. Declarative Pipelines currently only supports Spark Connect.`
	`33`	`+Exception in thread "main" org.apache.spark.SparkUserAppException: User application exited with 1`
	`34`	`+ at org.apache.spark.deploy.SparkPipelines$$anon1ドル.handle(SparkPipelines.scala:73)`
	`35`	`+ at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:169)`
	`36`	`+ at org.apache.spark.deploy.SparkPipelines$$anon1ドル.<init>(SparkPipelines.scala:58)`
	`37`	`+ at org.apache.spark.deploy.SparkPipelines$.splitArgs(SparkPipelines.scala:57)`
	`38`	`+ at org.apache.spark.deploy.SparkPipelines$.constructSparkSubmitArgs(SparkPipelines.scala:43)`
	`39`	`+ at org.apache.spark.deploy.SparkPipelines$.main(SparkPipelines.scala:37)`
	`40`	`+ at org.apache.spark.deploy.SparkPipelines.main(SparkPipelines.scala)`
	`41`	+```
	`42`	`+`
	`43`	`+## <span id="spark-pipelines"> spark-pipelines Shell Script`
	`44`	`+`
	`45`	+`spark-pipelines` shell script is used to launch [org.apache.spark.deploy.SparkPipelines](SparkPipelines.md).
	`46`	`+`
`25`	`47`	`## Demo`
`26`	`48`
`27`	`49`	`### Step 1. Register Dataflow Graph`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit 3840db9

File tree

2 files changed

2 files changed

`‎docs/declarative-pipelines/SparkPipelines.md`

`‎docs/declarative-pipelines/index.md`

0 commit comments