Using hive database in spark

社会主义新天地 提交于 2019-12-24 00:50:05

问题


I am new in spark and trying to run some queries on tpcds benchmark tables, using HortonWorks Sandbox. http://www.tpc.org/tpcds/ There is no problem while using hive through shell or hive-view on sandbox. The problem is that I don't know how connect to the database if I want to use the spark. How can I use a hive database in spark for running the queries? The only solution that I know till now is to rebuild each table manually and load data in them using the following scala codes, which is not the best solution.

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")
scala> sqlContext.sql("LOAD DATA LOCAL INPATH 'employee.txt' INTO TABLE employee")
scala> val result = sqlContext.sql("FROM employe SELECT id, name, age")
scala> result.show()

I also read some about hive-site.xml but I don't know where to find it and what changes to make on it to connect to the database.


回答1:


There is no need to connect to a specific database when using Spark and HiveContext.

You simply need to copy the "hive-site.xml" file to the Spark conf folder (or you could also create a symlink).

cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf/

Then, in Spark you can do something like that (I'm not a scala user so the syntax might be wrong) :

val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val result = hc.sql("SELECT col1, col2, col3 FROM dbname.tablename")
result.show()


来源:https://stackoverflow.com/questions/38770503/using-hive-database-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!