Spark 读写hive 表

spark 读写hive表主要是通过sparkssSession

读表的时候，很简单，直接像写sql一样sparkSession.sql("select * from xx") 就可以了。

这里主要是写数据，因为数据格式有很多类型，比如orc,parquet 等，这里就需要按需要的格式写数据。

首先，对于特殊的格式这里就要制定

　　 dataFrame.write.format("orc")的方式。

其次，对于写入分区表有2种方式，insertInto 和saveAsTable,

　　a) insertInto 不需要制定分区，分区应该是你创建表的时候已经写明了的。

  insertInto() can't be used together with partitionBy().Partition columns have already be defined for the table. It is not necessary to use partitionBy().

　　b) saveAsTable 抛异常：提示你用 insertInto，忘了把日志保存了。暂时记着吧。

类似问题：

http://blog.csdn.net/lc0817/article/details/78211695?utm_source=debugrun&utm_medium=referral

https://stackoverflow.com/questions/32362206/spark-dataframe-saveastable-with-partitionby-creates-no-orc-file-in-hdfs

来源：https://www.cnblogs.com/parkin/p/7919866.html

标签

Hive

spark

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!