Write single CSV file using spark-csv

前端 未结 13 1844
心在旅途
心在旅途 2020-11-22 08:43

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.

Need a Scala function which will take

13条回答
  •  隐瞒了意图╮
    2020-11-22 09:25

    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs._
    import org.apache.spark.sql.{DataFrame,SaveMode,SparkSession}
    import org.apache.spark.sql.functions._
    

    I solved using below approach (hdfs rename file name):-

    Step 1:- (Crate Data Frame and write to HDFS)

    df.coalesce(1).write.format("csv").option("header", "false").mode(SaveMode.Overwrite).save("/hdfsfolder/blah/")
    

    Step 2:- (Create Hadoop Config)

    val hadoopConfig = new Configuration()
    val hdfs = FileSystem.get(hadoopConfig)
    

    Step3 :- (Get path in hdfs folder path)

    val pathFiles = new Path("/hdfsfolder/blah/")
    

    Step4:- (Get spark file names from hdfs folder)

    val fileNames = hdfs.listFiles(pathFiles, false)
    println(fileNames)
    

    setp5:- (create scala mutable list to save all the file names and add it to the list)

        var fileNamesList = scala.collection.mutable.MutableList[String]()
        while (fileNames.hasNext) {
          fileNamesList += fileNames.next().getPath.getName
        }
        println(fileNamesList)
    

    Step 6:- (filter _SUCESS file order from file names scala list)

        // get files name which are not _SUCCESS
        val partFileName = fileNamesList.filterNot(filenames => filenames == "_SUCCESS")
    

    step 7:- (convert scala list to string and add desired file name to hdfs folder string and then apply rename)

    val partFileSourcePath = new Path("/yourhdfsfolder/"+ partFileName.mkString(""))
        val desiredCsvTargetPath = new Path(/yourhdfsfolder/+ "op_"+ ".csv")
        hdfs.rename(partFileSourcePath , desiredCsvTargetPath)
    

提交回复
热议问题