Spark - How to write a single csv file WITHOUT folder?

前端 未结 9 1160
北恋
北恋 2020-12-28 13:44

Suppose that df is a dataframe in Spark. The way to write df into a single CSV file is

df.coalesce(1).write.option(\"header\", \"tru

相关标签:
9条回答
  • 2020-12-28 14:23
    df.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("PATH/FOLDER_NAME/x.csv")
    

    you can use this and if you don't want to give the name of CSV everytime you can write UDF or create an array of the CSV file name and give it to this it will work

    0 讨论(0)
  • 2020-12-28 14:26

    For pyspark, you can convert to pandas dataframe and then save it.

    df.toPandas().to_csv("<path>/<filename.csv>", header=True, index=False)

    0 讨论(0)
  • 2020-12-28 14:27

    If you want to use only the python standard library this is an easy function that will write to a single file. You don't have to mess with tempfiles or going through another dir.

    import csv
    
    def spark_to_csv(df, file_path):
        """ Converts spark dataframe to CSV file """
        with open(file_path, "w") as f:
            writer = csv.DictWriter(f, fieldnames=df.columns)
            writer.writerow(dict(zip(fieldnames, fieldnames)))
            for row in df.toLocalIterator():
                writer.writerow(row.asDict())
    
    0 讨论(0)
提交回复
热议问题