Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 3d5627b

Browse files
[DP] GraphElementRegistry and the pipeline graph elements
1 parent 4741467 commit 3d5627b

File tree

4 files changed

+146
-1
lines changed

4 files changed

+146
-1
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# GraphElementRegistry
2+
3+
`GraphElementRegistry` is an [abstraction](#contract) of [graph element registries](#implementations).
4+
5+
## Contract
6+
7+
### register_dataset { #register_dataset }
8+
9+
```py
10+
register_dataset(
11+
self,
12+
dataset: Dataset,
13+
) -> None
14+
```
15+
16+
See:
17+
18+
* [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_dataset)
19+
20+
Used when:
21+
22+
* [@create_streaming_table](./index.md#create_streaming_table), [@table](./index.md#table), [@materialized_view](./index.md#materialized_view), [@temporary_view](./index.md#temporary_view) decorators are used
23+
24+
### register_flow { #register_flow }
25+
26+
```py
27+
register_flow(
28+
self,
29+
flow: Flow,
30+
) -> None
31+
```
32+
33+
See:
34+
35+
* [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_flow)
36+
37+
Used when:
38+
39+
* [@append_flow](./index.md#append_flow), [@table](./index.md#table), [@materialized_view](./index.md#materialized_view), [@temporary_view](./index.md#temporary_view) decorators are used
40+
41+
### register_sql { #register_sql }
42+
43+
```py
44+
register_sql(
45+
self,
46+
sql_text: str,
47+
file_path: Path,
48+
) -> None
49+
```
50+
51+
See:
52+
53+
* [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_sql)
54+
55+
Used when:
56+
57+
* `pyspark.pipelines.cli` is requested to [register_definitions](#register_definitions)
58+
59+
## Implementations
60+
61+
* [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# MaterializedView
2+
3+
`MaterializedView` is a `Table` that represents a materialized view in a pipeline dataflow graph.
4+
5+
`MaterializedView` is created using [@materialized_view](#materialized_view) decorator.
6+
7+
`MaterializedView` is a Python class.
8+
9+
## materialized_view
10+
11+
```py
12+
materialized_view(
13+
query_function: Optional[QueryFunction] = None,
14+
*,
15+
name: Optional[str] = None,
16+
comment: Optional[str] = None,
17+
spark_conf: Optional[Dict[str, str]] = None,
18+
table_properties: Optional[Dict[str, str]] = None,
19+
partition_cols: Optional[List[str]] = None,
20+
schema: Optional[Union[StructType, str]] = None,
21+
format: Optional[str] = None,
22+
) -> Union[Callable[[QueryFunction], None], None]
23+
```
24+
25+
`materialized_view` uses `query_function` for the parameters unless they are specified explicitly.
26+
27+
`materialized_view` uses the name of the decorated function as the name of the materialized view unless specified explicitly.
28+
29+
`materialized_view` makes sure that [GraphElementRegistry](GraphElementRegistry.md) has been set (using `graph_element_registration_context` context manager).
30+
31+
??? note "Demo"
32+
33+
```py
34+
from pyspark.pipelines.graph_element_registry import (
35+
graph_element_registration_context,
36+
get_active_graph_element_registry,
37+
)
38+
from pyspark.pipelines.spark_connect_graph_element_registry import (
39+
SparkConnectGraphElementRegistry,
40+
)
41+
42+
dataflow_graph_id = "demo_dataflow_graph_id"
43+
registry = SparkConnectGraphElementRegistry(spark, dataflow_graph_id)
44+
with graph_element_registration_context(registry):
45+
graph_registry = get_active_graph_element_registry()
46+
assert graph_registry == registry
47+
```
48+
49+
`materialized_view` creates a new `MaterializedView` and requests the `GraphElementRegistry` to [register_dataset](GraphElementRegistry.md#register_dataset) it.
50+
51+
`materialized_view` creates a new `Flow` and requests the `GraphElementRegistry` to [register_flow](GraphElementRegistry.md#register_flow) it.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# SparkConnectGraphElementRegistry
2+
3+
`SparkConnectGraphElementRegistry` is a [GraphElementRegistry](GraphElementRegistry.md).
4+
5+
## Creating Instance
6+
7+
`SparkConnectGraphElementRegistry` takes the following to be created:
8+
9+
* <span id="spark"> `SparkSession` (`SparkConnectClient`)
10+
* <span id="dataflow_graph_id"> Dataflow Graph ID
11+
12+
`SparkConnectGraphElementRegistry` is created when:
13+
14+
* `pyspark.pipelines.cli` is requested to [run](#run)
15+
16+
## register_dataset { #register_dataset }
17+
18+
??? note "GraphElementRegistry"
19+
20+
```py
21+
register_dataset(
22+
self,
23+
dataset: Dataset,
24+
) -> None
25+
```
26+
27+
`register_dataset` is part of the [GraphElementRegistry](GraphElementRegistry.md#register_dataset) abstraction.
28+
29+
`register_dataset` makes sure that the given `Dataset` is either `MaterializedView`, `StreamingTable` or `TemporaryView`.
30+
31+
`register_dataset` requests this [SparkSession](#spark) to [execute](#execute_command) a `PipelineCommand.DefineDataset`.

‎docs/declarative-pipelines/index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ As of this [Commit 6ab0df9]({{ spark.commit }}/6ab0df9287c5a9ce49769612c2bb0a1da
4444
from pyspark import pipelines as dp
4545
```
4646

47-
## Python Decorators for Datasets and Flows { #python-decorators }
47+
## Python Decorators for Tables and Flows { #python-decorators }
4848

4949
Declarative Pipelines uses the following [Python decorators](https://peps.python.org/pep-0318/) to describe tables and views:
5050

@@ -73,6 +73,8 @@ from pyspark import pipelines as dp
7373

7474
### @dp.materialized_view { #materialized_view }
7575

76+
Creates a [MaterializedView](MaterializedView.md) (for a table whose contents are defined to be the result of a query).
77+
7678
### @dp.table { #table }
7779

7880
### @dp.temporary_view { #temporary_view }

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /