SparkContext.addFile vs spark-submit --files

前端 未结 1 768
清歌不尽
清歌不尽 2021-02-05 15:46

I am using Spark 1.6.0. I want to pass some properties files like log4j.properties and some other customer properties file. I see that we can use --files but I also saw that the

相关标签:
1条回答
  • 2021-02-05 16:15

    It depends whether your Spark application is running in client or cluster mode.

    In client mode the driver (application master) is running locally and can access those files from your project, because they are available within the local file system. SparkContext.addFile should find your local files and work like expected.

    If your application is running in cluster mode. The application is submitted via spark-submit. This means that your whole application is transfered to the Spark master or Yarn, which starts the driver (application master) within the cluster on a specific node and within an separated environment. This environment has no access to your local project directory. So all necessary files has to be transfered as well. This can be achieved with the --files option. The same concept applies to jar files (dependencies of your Spark application). In cluster mode, they need to be added with the --jars option to be available within the classpath of the application master. If you use PySpark there is a --py-files option.

    0 讨论(0)
提交回复
热议问题