How to save a DataFrame as compressed (gzipped) CSV?

前端 未结 4 1922
感情败类
感情败类 2020-12-30 23:09

I use Spark 1.6.0 and Scala.

I want to save a DataFrame as compressed CSV format.

Here is what I have so far (assume I already have df and

4条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-30 23:57

    Spark 2.2+

    df.write.option("compression","gzip").csv("path")

    Spark 2.0

    df.write.csv("path", compression="gzip")

    Spark 1.6

    On the spark-csv github: https://github.com/databricks/spark-csv

    One can read:

    codec: compression codec to use when saving to file. Should be the fully qualified name of a class implementing org.apache.hadoop.io.compress.CompressionCodec or one of case-insensitive shorten names (bzip2, gzip, lz4, and snappy). Defaults to no compression when a codec is not specified.

    In this case, this works: df.write.format("com.databricks.spark.csv").codec("gzip")\ .save('my_directory/my_file.gzip')

提交回复
热议问题