问题
I'm running spark 2.1 and I want to write a csv with results into Amazon S3. After repartitioning the csv file has kind of a long kryptic name and I want to change that into a specific filename.
I'm using the databricks lib for writing into S3.
dataframe
.repartition(1)
.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("folder/dataframe/")
Is there a way to rename the file afterwards or even save it directly with the correct name? I've already looked for solutions and havent found much.
Thanks
回答1:
You can use below to rename the output file.
dataframe.repartition(1).write.format("com.databricks.spark.csv").option("header", "true").save("folder/dataframe/")
import org.apache.hadoop.fs._
val fs = FileSystem.get(sc.hadoopConfiguration)
val filePath = "folder/dataframe/"
val fileName = fs.globStatus(new Path(filePath+"part*"))(0).getPath.getName
fs.rename(new Path(filePath+fileName), new Path(filePath+"file.csv"))
回答2:
The code as you mentioned here returns a Unit. You would need to confirm when your Spark application has completed its run (assuming this is a batch case) and then rename
dataframe
.repartition(1)
.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("folder/dataframe/")
来源:https://stackoverflow.com/questions/44760244/rename-written-csv-file-spark