Commit 00699d9

authored

Update README.md

1 parent dc9fdd2 commit 00699d9Copy full SHA for 00699d9

File tree

1 file changed

-5

lines changed

README.md

1 file changed

-5

lines changed

`‎README.md‎`

Lines changed: 6 additions & 5 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,7 +2,12 @@`
`2`	`2`	`This project shows how to use SPARK as Cloud-based SQL Engine and expose your big-data as a JDBC/ODBC data source via the Spark thrift server.`
`3`	`3`
`4`	`4`	`### Central Idea`
`5`		`-Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework which works as listed below:`
	`5`	`+Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework.`
	`6`	`+`
	`7`	`+### Architecture`
	`8`	`+Following picture illustrates the idea we discussed above:`
	`9`	`+<img src="https://user-images.githubusercontent.com/22542670/27733176-54b684c2-5db2-11e7-946b-5b5ef5595e43.png" width="600" />`
	`10`	`+`
`6`	`11`	`- Data from multiple sources can be pushed into Spark and then exposed as SQLtable`
`7`	`12`	`- These tables are then made accessible as a JDBC/ODBC data source via the Spark thrift server.`
`8`	`13`	- Multiple clients like ```Beeline CLI```, ```JDBC```, ```ODBC``` or ```BI tools like Tableau``` connect to Spark thrift server.
`@@ -11,10 +16,6 @@ Traditional relational Database engines like SQL had scalability problems and so`
`11`	`16`
`12`	`17`	`#### To know more about this topic, please refer to my blog [here](https://spoddutur.github.io/spark-notes/spark-as-cloud-based-sql-engine-via-thrift-server) where I briefed the concept in detail.`
`13`	`18`
`14`		`-### Architecture`
`15`		`-Following picture illustrates the idea we discussed above:`
`16`		`-<img src="https://user-images.githubusercontent.com/22542670/27733176-54b684c2-5db2-11e7-946b-5b5ef5595e43.png" width="600" />`
`17`		`-`
`18`	`19`	`### Structure of the project:`
`19`	`20`	`- data: Contains input json used in MainApp to register sample data with SparkSql.`
`20`	`21`	`- src/main/java/MainApp.scala: Spark 2.1 implementation where it starts SparkSession and registers data from input.json with SparkSQL. (To keep the spark-session alive, there's a continuous while-loop in there).`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 00699d9

File tree

1 file changed

1 file changed

`‎README.md‎`

0 commit comments