Start HiveThriftServer programmatically in Python

社会主义新天地 提交于 2019-12-02 03:41:33

问题


In the spark-shell (scala), we import, org.apache.spark.sql.hive.thriftserver._ for starting Hive Thrift server programatically for a particular hive context as HiveThriftServer2.startWithContext(hiveContext) to expose a registered temp table for that particular session.

How can we do the same using python? Is there a package / api on python for importing HiveThriftServer? Any other thoughts / recommendations appreciated.

We have used pyspark for creating a dataframe

Thanks

Ravi Narayanan


回答1:


You can import it using py4j java gateway. The following code worked for spark 2.0.2 and could query temp tables registered in python script through beeline.

from py4j.java_gateway import java_import
java_import(sc._gateway.jvm,"")

spark = SparkSession \
        .builder \
        .appName(app_name) \
        .master(master)\
        .enableHiveSupport()\
        .config('spark.sql.hive.thriftServer.singleSession', True)\
        .getOrCreate()
sc=spark.sparkContext
sc.setLogLevel('INFO')

#Start the Thrift Server using the jvm and passing the same spark session corresponding to pyspark session in the jvm side.
sc._gateway.jvm.org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.startWithContext(spark._jwrapped)

spark.sql('CREATE TABLE myTable')
data_file="path to csv file with data"
dataframe = spark.read.option("header","true").csv(data_file).cache()
dataframe.createOrReplaceTempView("myTempView")

Then go to beeline to check if it correclty started:

in terminal> $SPARK_HOME/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000
beeline> show tables;

It should show the tables and temp tables/views created in python including "myTable" and "myTempView" above. It is necessary to have the same spark session in order to see temporary views

(see ans: Avoid starting HiveThriftServer2 with created context programmatically.
NOTE: It's possible to access hive tables even if the Thrift server is started from terminal and connected to the same metastore, however temp views cannot be accessed as they are in the spark session and not written to metastore)



来源:https://stackoverflow.com/questions/36628918/start-hivethriftserver-programmatically-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!