an ODBC driver?
The Apache Spark ODBC driver lets you work with Spark data in applications such as Perl, PHP, Excel, and Oracle).
Before the Apache Spark ODBC driver can be used to connect an application to Apache Spark, it's necessary to configure an ODBC data source. An ODBC data source stores the connection details for the target database (in this case, Apache Spark) and the ODBC driver that's required to connect to it (in this case, the Apache Spark ODBC driver).
ODBC data sources are configured in ODBC Data Source Administrator, which is included with Windows.
In ODBC Data Source Administrator:
SELECT * FROM MyTable
Strawberry Perl is a Perl distribution for Windows that includes the necessary middleware layers (Perl DBI and Perl DBD::ODBC) to enable the Apache Spark ODBC driver to connect your Perl applications to Apache Spark.
#!/usr/bin/perl -w
use strict;
use DBI;
my $dbh = DBI-> connect('dbi:ODBC:MyApacheSparkDataSource');
my $sql = "SELECT MyCol FROM MyTable LIMIT 10";
# Prepare the statement.
my $sth = $dbh->prepare($sql)
or die "Can't prepare statement: $DBI::errstr";
# Execute the statement.
$sth->execute();
my($SparkCol);
# Fetch and display the result set value.
while(($SparkCol) = $sth->fetchrow()){
print("$SparkCol\n");
}
$dbh->disconnect if ($dbh);
<?php
$con = odbc_connect("MyApacheSparkDataSource", "", "");
$err = odbc_errormsg();
if (strlen($err) <> 0) {
echo odbc_errormsg();
} else {
$rs2 = odbc_exec($con, "select MyCol from MyTable");
odbc_result_all($rs2);
odbc_close($con);
}
?>
Follow these steps to return data from Apache Spark to Microsoft Excel by using Microsoft Query:
Note that for large result sets, you may have to have to filter the data using Excel before the data can be returned to the worksheet.
%ORACLE_HOME%\hs\admin
directory. Create a copy of the file initdg4odbc.ora
. Name the new file initspark.ora
.
Note In these instructions, replace %ORACLE_HOME%
with the location of your Oracle HOME
directory. For example, C:\oraclexe\app\oracle\product11円.2.0\server
.
HS_FDS_CONNECT_INFO = MyApacheSparkDataSource
#HS_FDS_TRACE_LEVEL = <trace_level>
%ORACLE_HOME%\network\admin\listener.ora
that creates a SID_NAME
for DG4ODBC. For example:
SID_LIST_LISTENER = (SID_LIST = (SID_DESC= (SID_NAME=spark) (ORACLE_HOME=%ORACLE_HOME%) (PROGRAM=dg4odbc) ) )
%ORACLE_HOME%\network\admin\tnsnames.ora
that specifies the SID_NAME
created in the previous step. For example:
SPARK = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = oracle_host)(PORT = 1521)) (CONNECT_DATA = (SID = spark) ) (HS = OK) )
Replace oracle_host
with the host name of your Oracle machine.
cd %ORACLE_HOME%\bin lsnrctl stop lsnrctl start
CREATE PUBLIC DATABASE LINK SPARKLINK CONNECT TO "mydummyuser" IDENTIFIED BY "mydummypassword" USING 'spark';
SELECT * FROM "MyTable"@SPARKLINK;
%ORACLE_HOME%\hs\trace
directory. To enable DG4ODBC tracing, add the line HS_FDS_TRACE_LEVEL = DEBUG
to initspark.ora
and then start or restart the Oracle listener. If the trace
directory does not exist, create it.C:\SQL.log
), change the trace file location to the Windows TEMP
directory. For example, C:\Windows\Temp\SQL.log
.