Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 3840db9

Browse files
[SDP] SparkPipelines — Spark Pipelines CLI
1 parent ea323d5 commit 3840db9

File tree

2 files changed

+85
-0
lines changed

2 files changed

+85
-0
lines changed
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
title: SparkPipelines
3+
---
4+
5+
# SparkPipelines — Spark Pipelines CLI
6+
7+
`SparkPipelines` is a standalone application that can be executed using [spark-pipelines](./index.md#spark-pipelines) shell script.
8+
9+
`SparkPipelines` is a Scala "launchpad" to execute [python/pyspark/pipelines/cli.py](#pysparkpipelinesclipy) Python script (through [SparkSubmit]({{ book.spark_core }}/tools/spark-submit/SparkSubmit/)).
10+
11+
## PySpark Pipelines CLI
12+
13+
=== "uv run"
14+
15+
```console
16+
$ pwd
17+
/Users/jacek/oss/spark/python
18+
19+
$ PYTHONPATH=. uv run \
20+
--with grpcio-status \
21+
--with grpcio \
22+
--with pyarrow \
23+
--with pandas \
24+
--with pyspark \
25+
python pyspark/pipelines/cli.py
26+
...
27+
usage: cli.py [-h] {run,dry-run,init} ...
28+
cli.py: error: the following arguments are required: command
29+
```
30+
31+
### dry-run
32+
33+
Launch a run that just validates the graph and checks for errors
34+
35+
Option | Description | Default
36+
-|-|-
37+
`--spec` | Path to the pipeline spec | (undefined)
38+
39+
### init
40+
41+
Generate a sample pipeline project, including a spec file and example definitions
42+
43+
Option | Description | Default | Required
44+
-|-|-|:-:
45+
`--name` | Name of the project. A directory with this name will be created underneath the current directory | (undefined) | ✅
46+
47+
```console
48+
$ ./bin/spark-pipelines init --name hello-pipelines
49+
Pipeline project 'hello-pipelines' created successfully. To run your pipeline:
50+
cd 'hello-pipelines'
51+
spark-pipelines run
52+
```
53+
54+
### run
55+
56+
Run a pipeline. If no `--refresh` option specified, a default incremental update is performed.
57+
58+
Option | Description | Default
59+
-|-|-
60+
`--spec` | Path to the pipeline spec | (undefined)
61+
`--full-refresh` | List of datasets to reset and recompute (comma-separated) | (empty)
62+
`--full-refresh-all` | Perform a full graph reset and recompute | (undefined)
63+
`--refresh` | List of datasets to update (comma-separated) | (empty)

‎docs/declarative-pipelines/index.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,28 @@ Declarative Pipelines uses the following [Python decorators](https://peps.python
2222

2323
Once described, a pipeline can be [started](PipelineExecution.md#runPipeline) (on a [PipelineExecution](PipelineExecution.md)).
2424

25+
## Spark Connect Only
26+
27+
Declarative Pipelines currently only supports Spark Connect.
28+
29+
```console
30+
$ ./bin/spark-pipelines --conf spark.api.mode=xxx
31+
...
32+
25/08/03 12:33:57 INFO SparkPipelines: --spark.api.mode must be 'connect'. Declarative Pipelines currently only supports Spark Connect.
33+
Exception in thread "main" org.apache.spark.SparkUserAppException: User application exited with 1
34+
at org.apache.spark.deploy.SparkPipelines$$anon1ドル.handle(SparkPipelines.scala:73)
35+
at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:169)
36+
at org.apache.spark.deploy.SparkPipelines$$anon1ドル.<init>(SparkPipelines.scala:58)
37+
at org.apache.spark.deploy.SparkPipelines$.splitArgs(SparkPipelines.scala:57)
38+
at org.apache.spark.deploy.SparkPipelines$.constructSparkSubmitArgs(SparkPipelines.scala:43)
39+
at org.apache.spark.deploy.SparkPipelines$.main(SparkPipelines.scala:37)
40+
at org.apache.spark.deploy.SparkPipelines.main(SparkPipelines.scala)
41+
```
42+
43+
## <span id="spark-pipelines"> spark-pipelines Shell Script
44+
45+
`spark-pipelines` shell script is used to launch [org.apache.spark.deploy.SparkPipelines](SparkPipelines.md).
46+
2547
## Demo
2648

2749
### Step 1. Register Dataflow Graph

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /