You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,9 @@ This project shows how to use SPARK as Cloud-based SQL Engine and expose your bi
3
3
4
4
### Central Idea:
5
5
Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework.
6
-
To know more details on this please refer to [this](https://spoddutur.github.io/spark-notes/spark-as-cloud-based-sql-engine-via-thrift-server) blog.
6
+
7
+
### Complete Guide
8
+
To know more details about this, please refer to [this](https://spoddutur.github.io/spark-notes/spark-as-cloud-based-sql-engine-via-thrift-server) blog.
7
9
8
10
### What is the role of Spark Thrift Server in this?
9
11
SparkSQL enables fast, in-memory integration of external data sources with Hadoop for BI access over JDBC/ODBC. Spark ThriftServer makes this data queryable as JDBC/ODBC source.Spark Thrift Server is similar to HiveServer2 Thrift, instead of submitting sql queries as Hive MapReduce job, spark thrift will use Spark SQL engine which inturn uses full spark capabilities.
@@ -13,7 +15,7 @@ Following picture depicts the same:
13
15
14
16
### How to connect to Spark Thrift Server?
15
17
To connect to Spark ThriftServer, use JDBC/ODBC driver just like HiveServer2 and access Hive or Spark temp tables to run the sql queries on ApacheSpark framework. There are couple of ways to connect to it.
16
-
1. Beeline: Perhaps, the simplest is to use beeline command-line tool provided in Spark's bin folder.
18
+
1.**Beeline:** Perhaps, the simplest is to use beeline command-line tool provided in Spark's bin folder.
17
19
```markdown
18
20
`$> beeline`
19
21
Beeline version 2.1.1-amzn-0 by Apache Hive
@@ -27,13 +29,10 @@ Enter password for jdbc:hive2://localhost:10000:
27
29
// run your sql queries and access data..
28
30
`jdbc:hive2://localhost:10000> show tables;,`
29
31
```
30
-
2. Java JDBC: Please refer to this project's test folder where I've shared a java example to demo the same.
32
+
2.**Java JDBC:** Please refer to this project's test folder where I've shared a java example - `TestThriftClient` class - to demo the same.
31
33
32
34
### Requirements
33
35
- Spark 2.1.0, Java 1.8 and Scala 2.11
34
36
35
-
Guide:
36
-
[Spark as cloud-based SQL Engine exposing data via ThriftServer](https://spoddutur.github.io/spark-notes/spark-as-cloud-based-sql-engine-via-thrift-server)
37
-
38
37
References:
39
38
[MapR Docs on SparkThriftServer](http://maprdocs.mapr.com/home/Spark/SparkSQLThriftServer.html)
0 commit comments