connect to Remote Hive Server from R using RJDBC/RHive

筅森魡賤 提交于 2019-12-04 18:56:10

I didn't try rHive because it seems to need a complex installation on all the nodes of the cluster.

I successfully connect to Hive using RJDBC, here are a code snipet that works on my Hadoop 2.6 CDH5.4 cluster :

#loading libraries
library("DBI")
library("rJava")
library("RJDBC")

#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)

#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")

#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases

The harder is to find all the needs jars and where to find them ...

UPDATE The hive standalone JAR contains all that was needed to use Hive, using this standalone JAR with the hadoop-common jar is enough to use Hive.

So this is a simplified version, no need to worry to other jars that the hadoop-common and the hive-standalone jars.

 #loading libraries
 library("DBI")
 library("rJava")
 library("RJDBC")

 #init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
 cp = c("/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
 .jinit(classpath=cp)

 #initialisation de la connexion
 drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc-standalone.jar", identifier.quote="`")
 conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")

 #working with the connexion
 show_databases <- dbGetQuery(conn, "show databases")
 show_databases

Ioicmathieu's answer works for me now after I have switched to an older hive jar for example from 3.1.1 to 2.0.0.

Unfortunately I can't comment on his answer that's why I have written another one.

If you run into the following error try an older version:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://host_name: Could not establish connection to jdbc:hive2://host_name:10000: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=default})

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!