Write single CSV file using spark-csv

前端未结

关注

 13  1903

心在旅途 2020-11-22 08:43

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.

Need a Scala function which will take

13条回答

有刺的猬 (楼主)

2020-11-22 09:08

A solution that works for S3 modified from Minkymorgan.

Simply pass the temporary partitioned directory path (with different name than final path) as the srcPath and single final csv/txt as destPath Specify also deleteSource if you want to remove the original directory.

/**
* Merges multiple partitions of spark text file output into single file. 
* @param srcPath source directory of partitioned files
* @param dstPath output path of individual path
* @param deleteSource whether or not to delete source directory after merging
* @param spark sparkSession
*/
def mergeTextFiles(srcPath: String, dstPath: String, deleteSource: Boolean): Unit =  {
  import org.apache.hadoop.fs.FileUtil
  import java.net.URI
  val config = spark.sparkContext.hadoopConfiguration
  val fs: FileSystem = FileSystem.get(new URI(srcPath), config)
  FileUtil.copyMerge(
    fs, new Path(srcPath), fs, new Path(dstPath), deleteSource, config, null
  )
}

0 讨论(0)

查看其它13个回答