Merge Spark output CSV files with a single header

后端未结

关注

 6  787

天命终不由人 2021-01-01 11:40

I want to create a data processing pipeline in AWS to eventually use the processed data for Machine Learning.

I have a Scala script that takes raw data from S3, proc

6条回答

有刺的猬 (楼主)

2021-01-01 11:42

 // Convert JavaRDD  to CSV and save as text file
        outputDataframe.write()
                .format("com.databricks.spark.csv")
                // Header => true, will enable to have header in each file
                .option("header", "true")

Please follow the link with Integration test on how to write a single header

http://bytepadding.com/big-data/spark/write-a-csv-text-file-from-spark/

0 讨论(0)

查看其它6个回答