How do I write a dataset which contains only header (no rows) into a hdfs location (csv format) such that it contains the header when downloaded?

问题

I have a dataset which contains only header (id,name,age) and 0 rows. I want to write it into an hdfs location as a csv file using

DataFrameWriter dataFrameWriter = dataset.write();
Map<String, String> csvOptions = new HashMap<>();
csvOptions.put("header", "true");
dataFrameWriter = dataFrameWriter.options(csvOptions);
dataFrameWriter.mode(SaveMode.Overwrite).csv(location);

In the hdfs location , the files are:

1. _SUCCESS
2. tempFile.csv

If I go to that location and download the file (tempFile.csv) , I get an empty csv file. Have tried with header true and false both. How do I write the header as a content of the csv file?

回答1:

Well this is a workaround. In Scala, you can do something like this:

df.take(1).isEmpty match {

    case true => sc.parallelize(Array(df.schema.map(_.name).mkString(",")))
                .saveAsTextFile("temp")
    case false => df.write.save("temp")

}

df.schema returns the schema of dataframe df as StructType.

_.name returns the name of each column in the schema.

mkString(",") converts the Resultant Sequence of names to a comma separated String

Something similar can be done for Java, I guess.

回答2:

If you look at the code, you will find that the header is only written when there is at least one row.

UnivocityGenerator.scala

  def write(row: InternalRow): Unit = {
    if (printHeader) {
      gen.writeHeaders()
    }
    gen.writeRow(convertRow(row): _*)
    printHeader = false
  }

来源：https://stackoverflow.com/questions/45619265/how-do-i-write-a-dataset-which-contains-only-header-no-rows-into-a-hdfs-locati

标签

java

csv

Hadoop

apache-spark

apache-spark-dataset

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!