问题
I have a dataset which contains only header (id,name,age) and 0 rows. I want to write it into an hdfs location as a csv file using
DataFrameWriter dataFrameWriter = dataset.write();
Map<String, String> csvOptions = new HashMap<>();
csvOptions.put("header", "true");
dataFrameWriter = dataFrameWriter.options(csvOptions);
dataFrameWriter.mode(SaveMode.Overwrite).csv(location);
In the hdfs location , the files are:
1. _SUCCESS
2. tempFile.csv
If I go to that location and download the file (tempFile.csv) , I get an empty csv file. Have tried with header true and false both. How do I write the header as a content of the csv file?
回答1:
Well this is a workaround. In Scala, you can do something like this:
df.take(1).isEmpty match {
case true => sc.parallelize(Array(df.schema.map(_.name).mkString(",")))
.saveAsTextFile("temp")
case false => df.write.save("temp")
}
df.schema
returns the schema of dataframe df
as StructType
.
_.name
returns the name of each column in the schema.
mkString(",")
converts the Resultant Sequence of names to a comma separated String
Something similar can be done for Java, I guess.
回答2:
If you look at the code, you will find that the header is only written when there is at least one row.
UnivocityGenerator.scala
def write(row: InternalRow): Unit = {
if (printHeader) {
gen.writeHeaders()
}
gen.writeRow(convertRow(row): _*)
printHeader = false
}
来源:https://stackoverflow.com/questions/45619265/how-do-i-write-a-dataset-which-contains-only-header-no-rows-into-a-hdfs-locati