问题
how to read orc transaction hive table in spark?
I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data
See complete scenario :
hive> create table default.Hello(id int,name string) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');
hive> insert into default.hello values(10,'abc');
now I am trying to access Hive Orc data from Spark sql but it show only schema
spark.sql("select * from hello").show()
Output: id,name
回答1:
Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID
Please refer my answer for this issue
回答2:
spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.
ALTER TABLE Hello COMPACT 'major';
This compaction should make you able to see the data. (after some time the data is compacted)
回答3:
You would need to add an action at the end to force it to run the query:
spark.sql("Select * From Hello").show()
(The default here is to show 20 rows)
or
spark.sql("Select * From Hello").take(2)
to see 2 rows of output data.
These are just examples of actions that can be taken on a DataFrame.
来源:https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark