Write single CSV file using spark-csv

前端 未结 13 1852
心在旅途
心在旅途 2020-11-22 08:43

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.

Need a Scala function which will take

13条回答
  •  有刺的猬
    2020-11-22 09:08

    A solution that works for S3 modified from Minkymorgan.

    Simply pass the temporary partitioned directory path (with different name than final path) as the srcPath and single final csv/txt as destPath Specify also deleteSource if you want to remove the original directory.

    /**
    * Merges multiple partitions of spark text file output into single file. 
    * @param srcPath source directory of partitioned files
    * @param dstPath output path of individual path
    * @param deleteSource whether or not to delete source directory after merging
    * @param spark sparkSession
    */
    def mergeTextFiles(srcPath: String, dstPath: String, deleteSource: Boolean): Unit =  {
      import org.apache.hadoop.fs.FileUtil
      import java.net.URI
      val config = spark.sparkContext.hadoopConfiguration
      val fs: FileSystem = FileSystem.get(new URI(srcPath), config)
      FileUtil.copyMerge(
        fs, new Path(srcPath), fs, new Path(dstPath), deleteSource, config, null
      )
    }
    

提交回复
热议问题