how to read orc transaction hive table in spark?

问题

1. how to read orc transaction hive table in spark?
  
  I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data
  
  See complete scenario :
  
  hive> create table default.Hello(id int,name string) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');
  
  hive> insert into default.hello values(10,'abc');
  
  now I am trying to access Hive Orc data from Spark sql but it show only schema
  
  spark.sql("select * from hello").show()
  
  Output: id,name

回答1:

Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue

回答2:

spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)

回答3:

You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.

来源：https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark

标签

apache-spark

Hive

apache-spark-sql

orc