Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 00699d9

Browse files
Update README.md
1 parent dc9fdd2 commit 00699d9

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

‎README.md‎

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,12 @@
22
This project shows how to use SPARK as Cloud-based SQL Engine and expose your big-data as a JDBC/ODBC data source via the Spark thrift server.
33

44
### Central Idea
5-
Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework which works as listed below:
5+
Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework.
6+
7+
### Architecture
8+
Following picture illustrates the idea we discussed above:
9+
<img src="https://user-images.githubusercontent.com/22542670/27733176-54b684c2-5db2-11e7-946b-5b5ef5595e43.png" width="600" />
10+
611
- Data from multiple sources can be pushed into Spark and then exposed as SQLtable
712
- These tables are then made accessible as a JDBC/ODBC data source via the Spark thrift server.
813
- Multiple clients like ```Beeline CLI```, ```JDBC```, ```ODBC``` or ```BI tools like Tableau``` connect to Spark thrift server.
@@ -11,10 +16,6 @@ Traditional relational Database engines like SQL had scalability problems and so
1116

1217
#### To know more about this topic, please refer to my blog [here](https://spoddutur.github.io/spark-notes/spark-as-cloud-based-sql-engine-via-thrift-server) where I briefed the concept in detail.
1318

14-
### Architecture
15-
Following picture illustrates the idea we discussed above:
16-
<img src="https://user-images.githubusercontent.com/22542670/27733176-54b684c2-5db2-11e7-946b-5b5ef5595e43.png" width="600" />
17-
1819
### Structure of the project:
1920
- **data:** Contains input json used in MainApp to register sample data with SparkSql.
2021
- **src/main/java/MainApp.scala:** Spark 2.1 implementation where it starts SparkSession and registers data from input.json with SparkSQL. (To keep the spark-session alive, there's a continuous while-loop in there).

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /