Save a large Spark Dataframe as a single json file in S3

前端 未结 3 1974
星月不相逢
星月不相逢 2021-02-01 20:39

Im trying to save a Spark DataFrame (of more than 20G) to a single json file in Amazon S3, my code to save the dataframe is like this :

dataframe.repartition(1)         


        
3条回答
  •  借酒劲吻你
    2021-02-01 21:01

    I would try separating the large dataframe into a series of smaller dataframes that you then append into the same file in the target.

    df.write.mode('append').json(yourtargetpath)
    

提交回复
热议问题