Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/ RHive Public

RHive is an R extension facilitating distributed computing via Apache Hive.

Notifications You must be signed in to change notification settings

nexr/RHive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

212 Commits

Repository files navigation

NexR RHive 2.0

RHive is an R extension facilitating distributed computing via HIVE query. RHive allows easy usage of HQL(Hive SQL) in R, and allows easy usage of R objects and R functions in Hive.

Before installing RHive, you have to have installed Hadoop and Hive

Install Hadoop

  1. Single Node
  2. Cluster Node
  3. set HADOOP_HOME at local machine on which R runs

Install Hive

  1. install local machine and remote machine on which NameNode runs or Hive-Server runs.
  2. Installation Guide
  3. set HIVE_HOME at local machine on which R runs.
  4. launch Hive Server with following command on remote machine. it should be as a background process.
    • $HIVE_HOME/bin/hive --service hiveserver

Install R and Packages

  1. install R
    • need to install R on all tasktracker nodes
  2. install rJava
    • only install rJava on local machine.
  3. install Rserve
    • need to install Rserve on all tasktracker nodes
    • make configuration in path (/etc/Rserv.conf) on all tasktracker nodes. edit this file to add 'remote enable' to allow remote connection.
    • launch all Rserve on all tasktracker nodes.
      • e.q> R CMD Rserve
  4. setting tasktracker nodes
    • add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
      • e.q> export R_HOME=/usr/lib/R
  5. install RUnit

Install RHive

  1. Requirements
    • ant (in order to build java files)
  2. Installing RHive
    1. Download source code: git clone https://github.com/nexr/RHive.git
    2. Change your working directory: cd RHive
    3. Set the environment variables HIVE_HOME and HADOOP_HOME: export HIVE_HOME=/path/to/your/hive/directory export HADOOP_HOME=/path/to/your/hadoop/directory
    4. Build java files using ant: ant build
    5. Build RHive: R CMD build RHive
    6. Install RHive: R CMD INSTALL RHive_.tar.gz

Loading RHive and connecting to Hive

  1. Set the environment variables HIVE_HOME and HADOOP_HOME:
    • Set the environment variables: export HIVE_HOME=/path/to/your/hive/directory export HADOOP_HOME=/path/to/your/hadoop/directory export HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory
    • Or, add environment variables into Renviron HIVE_HOME=/path/to/your/hive/directory HADOOP_HOME=/path/to/your/hadoop/directory HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory
  2. launch R
library(RHive)
rhive.connect(host, port, hiveServer2)

Tutorials

Requirements

  • Java 1.6
  • R 2.13.0
  • Rserve 0.6-0
  • rJava 0.9-0
  • Hadoop 0.20.x (x >= 1)
  • Hive 0.8.x (x >= 0)

About

RHive is an R extension facilitating distributed computing via Apache Hive.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 7

AltStyle によって変換されたページ (->オリジナル) /