how to read orc transaction hive table in spark?

老子叫甜甜 提交于 2019-12-10 19:06:23

问题


    1. how to read orc transaction hive table in spark?

      I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data

      See complete scenario :

      hive> create table default.Hello(id int,name string) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');

      hive> insert into default.hello values(10,'abc');

      now I am trying to access Hive Orc data from Spark sql but it show only schema

      spark.sql("select * from hello").show()

      Output: id,name


回答1:


Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue




回答2:


spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)




回答3:


You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

or

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.



来源:https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!