How to access the HIVE ACID table in Spark sql?

问题

How could you access the HIVE ACID table, in Spark sql?

回答1:

We have worked on and open sourced a datasource that will enable users to work on their Hive ACID Transactional tables using Spark.

Github: https://github.com/qubole/spark-acid

It is available as a Spark package and instructions to use it are on the Github page. Currently the datasource supports only reading from Hive ACID tables, and we are working on adding the ability to write into these tables via Spark as well.

Feedback and suggestions are welcome!

回答2:

@aniket Spark doesn't support reading Hive Acid tables directly. (https://issues.apache.org/jira/browse/SPARK-15348/SPARK-16996) The data layout for transactional tables requires special logic to decide which directories to read and how to combine them correctly. Some data files may represent updates of previously written rows, for example. Also, if you are reading while something is writing to this table your read may fail (w/o the special logic) because it will try to read incomplete ORC files. Compaction may (again w/o the special logic) may make it look like your data is duplicated. It can be done (WIP) via LLAP - tracked in https://issues.apache.org/jira/browse/HIVE-12991

回答3:

I faced the same issue (Spark for Hive acid tables )and I can able to manage with JDBC call from Spark. May be I can use this JDBC call from spark until we get the native ACID support from Spark.

https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

回答4:

Spark can read acid table directly at least since spark 2.3.2. But I can aslo confirm it can't read acid table in spark 2.2.0.

来源：https://stackoverflow.com/questions/53199369/how-to-access-the-hive-acid-table-in-spark-sql

标签

scala

apache-spark-sql

hiveql

pyspark-sql