Commit 9bb9cfb

committed

[SDP] PipelinesHandler and Pipeline Commands

1 parent dbc9c67 commit 9bb9cfbCopy full SHA for 9bb9cfb

File tree

5 files changed

+42

-25

lines changed

docs/declarative-pipelines

5 files changed

+42

-25

lines changed

`‎docs/declarative-pipelines/GraphRegistrationContext.md`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -31,7 +31,7 @@ toDataflowGraph: DataflowGraph`
`31`	`31`
`32`	`32`	`toDataflowGraph` is used when:
`33`	`33`
`34`		-* `PipelinesHandler` ([Spark Connect]({{ book.spark_connect }})) is requested to [startRun](PipelinesHandler.md#startRun)
	`34`	+* `PipelinesHandler` ([Spark Connect]({{ book.spark_connect }})) is requested to [start a pipeline run](PipelinesHandler.md#startRun)
`35`	`35`
`36`	`36`	`## Tables { #tables }`
`37`	`37`

`‎docs/declarative-pipelines/PipelineExecution.md`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -18,13 +18,13 @@`
`18`	`18`	`runPipeline(): Unit`
`19`	`19`	```
`20`	`20`
`21`		-`runPipeline` [starts the pipeline](#startPipeline) and requests the [PipelineExecution](PipelineUpdateContext.md#pipelineExecution) (of this [PipelineUpdateContext](#context)) to [wait for the execution to complete](#awaitCompletion).
	`21`	+`runPipeline` [starts this pipeline](#startPipeline) and requests the [PipelineExecution](PipelineUpdateContext.md#pipelineExecution) (of this [PipelineUpdateContext](#context)) to [wait for the execution to complete](#awaitCompletion).
`22`	`22`
`23`	`23`	`---`
`24`	`24`
`25`	`25`	`runPipeline` is used when:
`26`	`26`
`27`		-* `PipelinesHandler` is requested to [startRun](PipelinesHandler.md#startRun) (for [Spark Connect]({{ book.spark_connect }}))
	`27`	+* `PipelinesHandler` is requested to [start a pipeline run](PipelinesHandler.md#startRun)
`28`	`28`
`29`	`29`	`## Start Pipeline { #startPipeline }`
`30`	`30`

`‎docs/declarative-pipelines/PipelinesHandler.md`

Lines changed: 35 additions & 22 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,6 +2,8 @@`
`2`	`2`
`3`	`3`	`PipelinesHandler` is used to [handle pipeline commands](#handlePipelinesCommand) in [Spark Connect]({{ book.spark_connect }}) ([SparkConnectPlanner]({{ book.spark_connect }}/server/SparkConnectPlanner), precisely).
`4`	`4`
	`5`	+`PipelinesHandler` acts as a bridge between Python and SQL "frontends" and Spark Connect Server (where pipeline execution happens).
	`6`	`+`
`5`	`7`	`## Handle Pipelines Command { #handlePipelinesCommand }`
`6`	`8`
`7`	`9`	```scala
`@@ -14,14 +16,14 @@ handlePipelinesCommand(`
`14`	`16`
`15`	`17`	`handlePipelinesCommand` handles the given pipeline `cmd` command.
`16`	`18`
`17`		`-\| PipelineCommand \| Description \|`
`18`		`-\|-----------------\|-------------\|`
`19`		-\| `CREATE_DATAFLOW_GRAPH` \| [Creates a new Dataflow Graph](#createDataflowGraph) \|
`20`		-\| `DROP_DATAFLOW_GRAPH` \| [Drops a pipeline](#DROP_DATAFLOW_GRAPH) \|
`21`		-\| `DEFINE_DATASET` \| [Defines a dataset](#DEFINE_DATASET) \|
`22`		-\| `DEFINE_FLOW` \| [Defines a flow](#DEFINE_FLOW) \|
`23`		-\| `START_RUN` \| [Starts a pipeline](#START_RUN) \|
`24`		-\| `DEFINE_SQL_GRAPH_ELEMENTS` \| [DEFINE_SQL_GRAPH_ELEMENTS](#DEFINE_SQL_GRAPH_ELEMENTS) \|
	`19`	`+\| PipelineCommand \| Description \| Initiator \|`
	`20`	`+\|-----------------\|-------------\|-----------\|`
	`21`	+\| `CREATE_DATAFLOW_GRAPH` \| [Creates a new dataflow graph](#CREATE_DATAFLOW_GRAPH)\|[pyspark.pipelines.spark_connect_pipeline](#create_dataflow_graph) \|
	`22`	+\| `DROP_DATAFLOW_GRAPH` \| [Drops a pipeline](#DROP_DATAFLOW_GRAPH) \|\|
	`23`	+\| `DEFINE_DATASET` \| [Defines a dataset](#DEFINE_DATASET) \|[SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_dataset)\|
	`24`	+\| `DEFINE_FLOW` \| [Defines a flow](#DEFINE_FLOW) \|[SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_flow)\|
	`25`	+\| `START_RUN` \| [Starts a pipeline run](#START_RUN)\|[pyspark.pipelines.spark_connect_pipeline](#start_run) \|
	`26`	+\| `DEFINE_SQL_GRAPH_ELEMENTS` \| [DEFINE_SQL_GRAPH_ELEMENTS](#DEFINE_SQL_GRAPH_ELEMENTS) \|[SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_sql)\|
`25`	`27`
`26`	`28`	`handlePipelinesCommand` reports an `UnsupportedOperationException` for incorrect commands:
`27`	`29`
`@@ -33,9 +35,13 @@ handlePipelinesCommand(`
`33`	`35`
`34`	`36`	`handlePipelinesCommand` is used when:
`35`	`37`
`36`		-* `SparkConnectPlanner` is requested to `handlePipelineCommand` (for `PIPELINE_COMMAND` command)
	`38`	+* `SparkConnectPlanner` ([Spark Connect]({{ book.spark_connect }}/server/SparkConnectPlanner)) is requested to `handlePipelineCommand` (for `PIPELINE_COMMAND` command)
	`39`	`+`
	`40`	`+### CREATE_DATAFLOW_GRAPH { #CREATE_DATAFLOW_GRAPH }`
`37`	`41`
`38`		`-### Define Dataset Command { #DEFINE_DATASET }`
	`42`	+`handlePipelinesCommand` [creates a dataflow graph](#createDataflowGraph) and sends the graph ID back.
	`43`	`+`
	`44`	`+### DEFINE_DATASET { #DEFINE_DATASET }`
`39`	`45`
`40`	`46`	`handlePipelinesCommand` prints out the following INFO message to the logs:
`41`	`47`
`@@ -45,7 +51,7 @@ Define pipelines dataset cmd received: [cmd]`
`45`	`51`
`46`	`52`	`handlePipelinesCommand` [defines a dataset](#defineDataset).
`47`	`53`
`48`		`-### Define Flow Command { #DEFINE_FLOW }`
	`54`	`+### DEFINE_FLOW { #DEFINE_FLOW }`
`49`	`55`
`50`	`56`	`handlePipelinesCommand` prints out the following INFO message to the logs:
`51`	`57`
`@@ -55,7 +61,17 @@ Define pipelines flow cmd received: [cmd]`
`55`	`61`
`56`	`62`	`handlePipelinesCommand` [defines a flow](#defineFlow).
`57`	`63`
`58`		`-### Start Pipeline { #startRun }`
	`64`	`+### START_RUN { #START_RUN }`
	`65`	`+`
	`66`	+`handlePipelinesCommand` prints out the following INFO message to the logs:
	`67`	`+`
	`68`	+```text
	`69`	`+Start pipeline cmd received: [cmd]`
	`70`	+```
	`71`	`+`
	`72`	+`handlePipelinesCommand` [starts a pipeline run](#startRun).
	`73`	`+`
	`74`	`+## Start Pipeline Run { #startRun }`
`59`	`75`
`60`	`76`	```scala
`61`	`77`	`startRun(`
`@@ -64,21 +80,18 @@ startRun(`
`64`	`80`	`sessionHolder: SessionHolder): Unit`
`65`	`81`	```
`66`	`82`
`67`		-`startRun` prints out the following INFO message to the logs:
`68`		`-`
`69`		-```text
`70`		`-Start pipeline cmd received: [cmd]`
`71`		-```
	`83`	+??? note "`START_RUN` Pipeline Command"
	`84`	+ `startRun` is used when `PipelinesHandler` is requested to handle [proto.PipelineCommand.CommandTypeCase.START_RUN](#START_RUN) command.
`72`	`85`
`73`	`86`	`startRun` finds the [GraphRegistrationContext](GraphRegistrationContext.md) by `dataflowGraphId` in the [DataflowGraphRegistry](DataflowGraphRegistry.md) (in the given `SessionHolder`).
`74`	`87`
`75`	`88`	`startRun` creates a `PipelineEventSender` to send pipeline events back to the Spark Connect client (_Python pipeline runtime_).
`76`	`89`
`77`	`90`	`startRun` creates a [PipelineUpdateContextImpl](PipelineUpdateContextImpl.md) (with the `PipelineEventSender`).
`78`	`91`
`79`		-In the end, `startRun` requests the `PipelineUpdateContextImpl` for the [PipelineExecution](PipelineExecution.md) to [runPipeline](PipelineExecution.md#runPipeline) or [dryRunPipeline](PipelineExecution.md#dryRunPipeline) for `dry-run` or `run` command, respectively.
	`92`	+In the end, `startRun` requests the `PipelineUpdateContextImpl` for the [PipelineExecution](PipelineUpdateContext.md#pipelineExecution) to [run a pipeline](PipelineExecution.md#runPipeline) or [dry-run a pipeline](PipelineExecution.md#dryRunPipeline) for `dry-run` or `run` command, respectively.
`80`	`93`
`81`		`-### Create Dataflow Graph { #createDataflowGraph }`
	`94`	`+## Create Dataflow Graph { #createDataflowGraph }`
`82`	`95`
`83`	`96`	```scala
`84`	`97`	`createDataflowGraph(`
`@@ -90,7 +103,7 @@ createDataflowGraph(`
`90`	`103`
`91`	`104`	`createDataflowGraph` returns the ID of the created dataflow graph.
`92`	`105`
`93`		`-### defineSqlGraphElements { #defineSqlGraphElements }`
	`106`	`+## defineSqlGraphElements { #defineSqlGraphElements }`
`94`	`107`
`95`	`108`	```scala
`96`	`109`	`defineSqlGraphElements(`
`@@ -100,7 +113,7 @@ defineSqlGraphElements(`
`100`	`113`
`101`	`114`	`defineSqlGraphElements`...FIXME
`102`	`115`
`103`		`-### Define Dataset (Table or View) { #defineDataset }`
	`116`	`+## Define Dataset (Table or View) { #defineDataset }`
`104`	`117`
`105`	`118`	```scala
`106`	`119`	`defineDataset(`
@@ -123,7 +136,7 @@ For unknown types, `defineDataset` reports an `IllegalArgumentException`:
`123`	`136`	`Unknown dataset type: [type]`
`124`	`137`	```
`125`	`138`
`126`		`-### Define Flow { #defineFlow }`
	`139`	`+## Define Flow { #defineFlow }`
`127`	`140`
`128`	`141`	```scala
`129`	`142`	`defineFlow(`

`‎docs/declarative-pipelines/UnresolvedFlow.md`

Lines changed: 3 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# UnresolvedFlow`
	`2`	`+`
	`3`	+`UnresolvedFlow` is...FIXME

`‎docs/declarative-pipelines/index.md`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
@@ -152,6 +152,7 @@ Pipelines elements are defined in SQL files included as `definitions` in a [pipe
`152`	`152`	`Supported SQL statements:`
`153`	`153`
`154`	`154`	`* [CREATE FLOW AS INSERT INTO BY NAME](../sql/SparkSqlAstBuilder.md#visitCreatePipelineInsertIntoFlow)`
	`155`	`+* ...`
`155`	`156`
`156`	`157`	`## Demo: Create Virtual Environment for Python Client`
`157`	`158`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit 9bb9cfb

File tree

5 files changed

5 files changed

`‎docs/declarative-pipelines/GraphRegistrationContext.md`

`‎docs/declarative-pipelines/PipelineExecution.md`

`‎docs/declarative-pipelines/PipelinesHandler.md`

`‎docs/declarative-pipelines/UnresolvedFlow.md`

`‎docs/declarative-pipelines/index.md`

0 commit comments