How to access the HIVE ACID table in Spark sql?

后端 未结 4 1621
暗喜
暗喜 2021-01-15 19:07

How could you access the HIVE ACID table, in Spark sql?

相关标签:
4条回答
  • 2021-01-15 19:46

    We have worked on and open sourced a datasource that will enable users to work on their Hive ACID Transactional tables using Spark.

    Github: https://github.com/qubole/spark-acid

    It is available as a Spark package and instructions to use it are on the Github page. Currently the datasource supports only reading from Hive ACID tables, and we are working on adding the ability to write into these tables via Spark as well.

    Feedback and suggestions are welcome!

    0 讨论(0)
  • 2021-01-15 19:51

    Spark can read acid table directly at least since spark 2.3.2. But I can aslo confirm it can't read acid table in spark 2.2.0.

    0 讨论(0)
  • 2021-01-15 19:52

    @aniket Spark doesn't support reading Hive Acid tables directly. (https://issues.apache.org/jira/browse/SPARK-15348/SPARK-16996) The data layout for transactional tables requires special logic to decide which directories to read and how to combine them correctly. Some data files may represent updates of previously written rows, for example. Also, if you are reading while something is writing to this table your read may fail (w/o the special logic) because it will try to read incomplete ORC files. Compaction may (again w/o the special logic) may make it look like your data is duplicated. It can be done (WIP) via LLAP - tracked in https://issues.apache.org/jira/browse/HIVE-12991

    0 讨论(0)
  • 2021-01-15 19:59

    I faced the same issue (Spark for Hive acid tables )and I can able to manage with JDBC call from Spark. May be I can use this JDBC call from spark until we get the native ACID support from Spark.

    https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

    0 讨论(0)
提交回复
热议问题