Merge Spark output CSV files with a single header

后端 未结 6 783
天命终不由人
天命终不由人 2021-01-01 11:40

I want to create a data processing pipeline in AWS to eventually use the processed data for Machine Learning.

I have a Scala script that takes raw data from S3, proc

6条回答
  •  有刺的猬
    2021-01-01 11:42

     // Convert JavaRDD  to CSV and save as text file
            outputDataframe.write()
                    .format("com.databricks.spark.csv")
                    // Header => true, will enable to have header in each file
                    .option("header", "true")
    

    Please follow the link with Integration test on how to write a single header

    http://bytepadding.com/big-data/spark/write-a-csv-text-file-from-spark/

提交回复
热议问题