You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -36,12 +36,20 @@ Declarative Pipelines uses [Python decorators](#python-decorators) to describe t
36
36
37
37
Once described, a pipeline can be [started](PipelineExecution.md#runPipeline) (on a [PipelineExecution](PipelineExecution.md)).
38
38
39
+
## Python Import Alias Convention
40
+
41
+
As of this [Commit 6ab0df9]({{ spark.commit }}/6ab0df9287c5a9ce49769612c2bb0a1daab83bee), the convention to alias the import of Declarative Pipelines in Python is `dp` (from `sdp`).
42
+
43
+
```python
44
+
from pyspark import pipelines as dp
45
+
```
46
+
39
47
## Python Decorators for Datasets and Flows { #python-decorators }
40
48
41
49
Declarative Pipelines uses the following [Python decorators](https://peps.python.org/pep-0318/) to describe tables and views:
42
50
43
-
*[@sdp.materialized_view](#materialized_view) for materialized views
44
-
*[@sdp.table](#table) for streaming and batch tables
51
+
*[@dp.materialized_view](#materialized_view) for materialized views
52
+
*[@dp.table](#table) for streaming and batch tables
Activate (_source_) the virtual environment (that `uv` helped us create).
119
+
120
+
```shell
121
+
source .venv/bin/activate
122
+
```
123
+
124
+
This activation brings all the necessary PySpark modules that have not been released yet and are only available in the source format only (incl. Spark Declarative Pipelines).
71
125
72
126
## Demo: Python API
73
127
128
+
??? warning "Activate Virtual Environment"
129
+
Follow [Demo: Create Virtual Environment for Python Client](#demo-create-virtual-environment-for-python-client) before getting started with this demo.
Activate (_source_) the virtual environment (that `uv` helped us create).
175
-
It will bring all the necessary PySpark modules that have not been released yet and are only available in the source format only.
207
+
Pipelines CLI
176
208
177
-
```bash
178
-
source .venv/bin/activate
179
-
```
209
+
positional arguments:
210
+
{run,dry-run,init}
211
+
run Run a pipeline. If no refresh options specified, a
212
+
default incremental update is performed.
213
+
dry-run Launch a run that just validates the graph and checks
214
+
for errors.
215
+
init Generate a sample pipeline project, including a spec
216
+
file and example definitions.
180
217
181
-
```console
182
-
❯ $SPARK_HOME/bin/spark-pipelines --help
183
-
usage: cli.py [-h] {run,dry-run,init} ...
184
-
185
-
Pipelines CLI
186
-
187
-
positional arguments:
188
-
{run,dry-run,init}
189
-
run Run a pipeline. If no refresh options specified, a
190
-
default incremental update is performed.
191
-
dry-run Launch a run that just validates the graph and checks
192
-
for errors.
193
-
init Generate a sample pipeline project, including a spec
194
-
file and example definitions.
195
-
196
-
options:
197
-
-h, --help show this help message and exit
198
-
```
218
+
options:
219
+
-h, --help show this help message and exit
220
+
```
199
221
200
-
```bash
201
-
$SPARK_HOME/bin/spark-pipelines dry-run
202
-
```
222
+
Execute `spark-pipelines dry-run` to validate a graph and checks for errors.
203
223
204
-
??? note "Output"
205
-
```console
206
-
Traceback (most recent call last):
207
-
File "/Users/jacek/oss/spark/python/pyspark/pipelines/cli.py", line 382, in <module>
208
-
main()
209
-
File "/Users/jacek/oss/spark/python/pyspark/pipelines/cli.py", line 358, in main
210
-
spec_path = find_pipeline_spec(Path.cwd())
211
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
212
-
File "/Users/jacek/oss/spark/python/pyspark/pipelines/cli.py", line 101, in find_pipeline_spec
213
-
raise PySparkException(
214
-
pyspark.errors.exceptions.base.PySparkException: [PIPELINE_SPEC_FILE_NOT_FOUND] No pipeline.yaml or pipeline.yml file provided in arguments or found in directory `/` or readable ancestor directories.
224
+
You haven't created a pipeline graph yet (and any exceptions are expected).
225
+
226
+
=== "Command Line"
227
+
228
+
```shell
229
+
$SPARK_HOME/bin/spark-pipelines dry-run
215
230
```
216
231
232
+
!!! note ""
233
+
```console
234
+
Traceback (most recent call last):
235
+
File "/Users/jacek/oss/spark/python/pyspark/pipelines/cli.py", line 382, in <module>
236
+
main()
237
+
File "/Users/jacek/oss/spark/python/pyspark/pipelines/cli.py", line 358, in main
238
+
spec_path = find_pipeline_spec(Path.cwd())
239
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
240
+
File "/Users/jacek/oss/spark/python/pyspark/pipelines/cli.py", line 101, in find_pipeline_spec
241
+
raise PySparkException(
242
+
pyspark.errors.exceptions.base.PySparkException: [PIPELINE_SPEC_FILE_NOT_FOUND] No pipeline.yaml or pipeline.yml file provided in arguments or found in directory `/` or readable ancestor directories.
243
+
```
244
+
217
245
Create a demo double `hello-spark-pipelines` pipelines project with a sample `pipeline.yml` and sample transformations (in Python and in SQL).
0 commit comments