How to get path to the uploaded file

☆樱花仙子☆ 提交于 2019-11-26 11:25:34

问题


I am running an spark cluster on google cloud and I upload a configuration file with each job. What is the path to a file that is uploaded with a submit command?

In the example below how can I read the file Configuration.properties before the SparkContext has been initialized? I am using Scala.

 gcloud dataproc jobs submit spark --cluster my-cluster --class MyJob  --files  config/Configuration.properties --jars my.jar  

回答1:


Local path to a file distributed using SparkFiles mechanism (--files argument, SparkContext.addFile) method can be obtained using SparkFiles.get:

org.apache.spark.SparkFiles.get(fileName)

You can also get the path to the root directory using SparkFiles.getRootDirectory:

org.apache.spark.SparkFiles.getRootDirectory

You can use these combined with standard IO utilities to read the files.

how can I read the file Configuration.properties before the SparkContext has been initialized?

SparkFiles are distributed by the driver, cannot be accessed before context has been initialized, and to be distributed in the first place, have to be accessible from the driver node. So this part of the question solely depends what type of storage you'll use to expose the file to the driver node.



来源:https://stackoverflow.com/questions/41677897/how-to-get-path-to-the-uploaded-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!