PySpark: spit out single file when writing instead of multiple part files

前端 未结 3 1168
北荒
北荒 2021-01-02 12:50

Is there a way to prevent PySpark from creating several small files when writing a DataFrame to JSON file?

If I run:

 df.write.format(\'json\').save(         


        
3条回答
  •  时光说笑
    2021-01-02 13:15

    df1.rdd.repartition(1).write.json('myfile.json')

    Would be nice, but isn't available. Check this related question. https://stackoverflow.com/a/33311467/2843520

提交回复
热议问题