How to pass multiple column in partitionby method in Spark

浪尽此生 提交于 2019-12-25 01:48:43

问题


I am a newbie in Spark.I want to write the dataframe data into hive table. Hive table is partitioned on mutliple column. Through, Hivemetastore client I am getting the partition column and passing that as a variable in partitionby clause in write method of dataframe.

var1="country","state" (Getting the partiton column names of hive table)
dataframe1.write.partitionBy(s"$var1").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")

When I am executing the above code,it is giving me error partiton "country","state" does not exists. I think it is taking "country","state" as a string.

Can you please help me out.


回答1:


The partitionBy function takes a varargs not a list. You can use this as

dataframe1.write.partitionBy("country","state").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")

Or in scala you can convert a list into a varargs like

val columns = Seq("country","state")
dataframe1.write.partitionBy(columns:_*).mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")


来源:https://stackoverflow.com/questions/51569092/how-to-pass-multiple-column-in-partitionby-method-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!