Spark with Webhdfs/httpfs

感情迁移 提交于 2020-01-13 06:04:29

问题


I would like to read a file from HDFS into Spark via httpfs or Webhdfs. Something along the lines of

sc.textFile("webhdfs://myhost:14000/webhdfs/v1/path/to/file.txt")

or, ideally,

sc.textFile("httpfs://myhost:14000/webhdfs/v1/path/to/file.txt")

Is there a way to get Spark to read the file over Webhdfs/httpfs?


回答1:


I believe WebHDFS/ HttpFS are like streaming sources to transmit the data over REST-API.

Then Spark Streaming can be used to receive the data from the WebHDFS/ HttpFS.




回答2:


According to SPARK-2930 document enhancement request, spark.yarn.access.namenodes should also works for webhdfs / hdfs. SPARK-2930 clarify docs on using webhdfs with spark.yarn.access.namenodes

Running Spark on YARN Get more details about spark.yarn.access.namenodes



来源:https://stackoverflow.com/questions/27367962/spark-with-webhdfs-httpfs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!