connect to Remote Hive Server from R using RJDBC/RHive

久未见 提交于 2019-12-09 23:18:06

问题


I'm using RJDBC 0.2-5 to connect to Hive in Rstudio. My server has hadoop-2.4.1 and hive-0.14. I follow the below mention steps to connect to Hive.

library(DBI)
library(rJava)
library(RJDBC)
.jinit(parameters="-DrJava.debug=true")
drv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver", 
            c("/home/packages/hive/New folder3/commons-logging-1.1.3.jar",
              "/home/packages/hive/New folder3/hive-jdbc-0.14.0.jar",
              "/home/packages/hive/New folder3/hive-metastore-0.14.0.jar",
              "/home/packages/hive/New folder3/hive-service-0.14.0.jar",
              "/home/packages/hive/New folder3/libfb303-0.9.0.jar",
              "/home/packages/hive/New folder3/libthrift-0.9.0.jar",
              "/home/packages/hive/New folder3/log4j-1.2.16.jar",
              "/home/packages/hive/New folder3/slf4j-api-1.7.5.jar",
              "/home/packages/hive/New folder3/slf4j-log4j12-1.7.5.jar",
              "/home/packages/hive/New folder3/hive-common-0.14.0.jar",
            "/home/packages/hive/New folder3/hadoop-core-0.20.2.jar",
            "/home/packages/hive/New folder3/hive-serde-0.14.0.jar",
             "/home/packages/hive/New folder3/hadoop-common-2.4.1.jar"),
            identifier.quote="`")

conHive <- dbConnect(drv, "jdbc:hive://myserver:10000/default",
                  "usr",
                  "pwd")

But I am always getting the following error:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars

Even I tried with different version of Hive jar, Hive-jdbc-standalone.jar but nothing seems to work.. I also use RHive to connect to Hive but there was also no success.

Can anyone help me?.. I kind of stuck :(


回答1:


I didn't try rHive because it seems to need a complex installation on all the nodes of the cluster.

I successfully connect to Hive using RJDBC, here are a code snipet that works on my Hadoop 2.6 CDH5.4 cluster :

#loading libraries
library("DBI")
library("rJava")
library("RJDBC")

#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)

#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")

#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases

The harder is to find all the needs jars and where to find them ...

UPDATE The hive standalone JAR contains all that was needed to use Hive, using this standalone JAR with the hadoop-common jar is enough to use Hive.

So this is a simplified version, no need to worry to other jars that the hadoop-common and the hive-standalone jars.

 #loading libraries
 library("DBI")
 library("rJava")
 library("RJDBC")

 #init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
 cp = c("/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
 .jinit(classpath=cp)

 #initialisation de la connexion
 drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc-standalone.jar", identifier.quote="`")
 conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")

 #working with the connexion
 show_databases <- dbGetQuery(conn, "show databases")
 show_databases



回答2:


Ioicmathieu's answer works for me now after I have switched to an older hive jar for example from 3.1.1 to 2.0.0.

Unfortunately I can't comment on his answer that's why I have written another one.

If you run into the following error try an older version:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://host_name: Could not establish connection to jdbc:hive2://host_name:10000: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=default})



来源:https://stackoverflow.com/questions/33007353/connect-to-remote-hive-server-from-r-using-rjdbc-rhive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!