You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,12 @@
2
2
This project shows how to use SPARK as Cloud-based SQL Engine and expose your big-data as a JDBC/ODBC data source via the Spark thrift server.
3
3
4
4
### Central Idea
5
-
Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework which works as listed below:
5
+
Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework.
6
+
7
+
### Architecture
8
+
Following picture illustrates the idea we discussed above:
- Data from multiple sources can be pushed into Spark and then exposed as SQLtable
7
12
- These tables are then made accessible as a JDBC/ODBC data source via the Spark thrift server.
8
13
- Multiple clients like ```Beeline CLI```, ```JDBC```, ```ODBC``` or ```BI tools like Tableau``` connect to Spark thrift server.
@@ -11,10 +16,6 @@ Traditional relational Database engines like SQL had scalability problems and so
11
16
12
17
#### To know more about this topic, please refer to my blog [here](https://spoddutur.github.io/spark-notes/spark-as-cloud-based-sql-engine-via-thrift-server) where I briefed the concept in detail.
13
18
14
-
### Architecture
15
-
Following picture illustrates the idea we discussed above:
-**data:** Contains input json used in MainApp to register sample data with SparkSql.
20
21
-**src/main/java/MainApp.scala:** Spark 2.1 implementation where it starts SparkSession and registers data from input.json with SparkSQL. (To keep the spark-session alive, there's a continuous while-loop in there).
0 commit comments