Where is the reference for options for writing or reading per format?

只愿长相守 提交于 2019-11-26 16:05:36

问题


I use Spark 1.6.1.

We are trying to write an ORC file to HDFS using HiveContext and DataFrameWriter. While we can use

df.write().orc(<path>)

we would rather do something like

df.write().options(Map("format" -> "orc", "path" -> "/some_path")

This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library. Where can we find a reference to the options that can be passed into the DataFrameWriter? I found nothing in the docs here

https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrameWriter.html#options(java.util.Map)


回答1:


Where can we find a reference to the options that can be passed into the DataFrameWriter?

The most definitive and authoritative answer are the sources:

  • CSVOptions
  • JDBCOptions
  • JSONOptions
  • ParquetOptions
  • TextOptions
  • OrcOptions
  • ...

Some description you may find in the docs, but there is no single page (that could possibly be auto-generated from the sources to stay up-to-date the most).

The reason being that the options are separated from the format implementation on purpose to have the flexibility you want to offer per use case (as you duly noted):

This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library.


Your question seems similar to How to know the file formats supported by Databricks? where I said:

Where can I get the list of options supported for each file format?

That's not possible as there is no API to follow (like in Spark MLlib) to define options. Every format does this on its own...unfortunately and your best bet is to read the documentation or (more authoritative) the source code.



来源:https://stackoverflow.com/questions/44365042/where-is-the-reference-for-options-for-writing-or-reading-per-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!