How do I connect to Hive from spark using Scala on IntelliJ?

痞子三分冷 提交于 2021-02-05 06:50:52

问题


I am new to hive and spark and am trying to figure out a way to access tables in hive to manipulate and access the data. How can it be done?


回答1:


in spark < 2.0

 val sc = new SparkContext(conf)

 val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
 val myDataFrame = sqlContext.sql("select * from mydb.mytable")

in later versions of spark, use SparkSession:

SparkSession is now the new entry point of Spark that replaces the old SQLContext and HiveContext. Note that the old SQLContext and HiveContext are kept for backward compatibility. A new catalog interface is accessible from SparkSession - existing API on databases and tables access such as listTables, createExternalTable, dropTempView, cacheTable are moved here. -- from the docs

val spark = SparkSession
  .builder()
  .appName("Spark Hive Example")
  .config("spark.sql.warehouse.dir", warehouseLocation)
  .enableHiveSupport()
  .getOrCreate()
 val myDataFrame = spark.sql("select * from mydb.mytable")



回答2:


If your hive server resides on remote cluster you can refer pulling data over jdbc connection like:

import scala.collection.mutable.MutableList

case class TempTable (
  column1: String,
  column2: String
)

val conn: Connection = DriverManager.getConnection(url, user, password)
val res: ResultSet = conn.createStatement
                   .executeQuery("SELECT * FROM table_name")
val fetchedRes = MutableList[TempTable]()
while(res.next()) {
  var rec = TempTable(res.getString("column1"), 
     res.getString("column2"))
  fetchedRes += rec
}
conn.close()
val resultRDD = sc.parallelize(fetchedRes)
resultRDD.cache()


来源:https://stackoverflow.com/questions/45153803/how-do-i-connect-to-hive-from-spark-using-scala-on-intellij

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!