How to access external property file in spark-submit job?

前端 未结 2 479
有刺的猬
有刺的猬 2021-01-24 02:37

I am using spark 2.4.1 version and java8. I am trying to load external property file while submitting my spark job using spark-submit.

As I am using below TypeSafe to lo

2条回答
  •  天涯浪人
    2021-01-24 03:19

    --files and SparkFiles.get

    With --files you should access the resource using SparkFiles.get as follows:

    $ ./bin/spark-shell --files README.md
    
    scala> import org.apache.spark._
    import org.apache.spark._
    
    scala> SparkFiles.get("README.md")
    res0: String = /private/var/folders/0w/kb0d3rqn4zb9fcc91pxhgn8w0000gn/T/spark-f0b16df1-fba6-4462-b956-fc14ee6c675a/userFiles-eef6d900-cd79-4364-a4a2-dd177b4841d2/README.md
    

    In other words, Spark will distribute the --files to executors, but the only way to know the path of the files is to use SparkFiles utility.

    getResourceAsStream(resourceFile) and InputStream

    The other option would be to package all resource files into a jar file and bundle it together with the other jar files (either as a single uber-jar or simply as part of CLASSPATH of the Spark app) and use the following trick:

    this.getClass.getClassLoader.getResourceAsStream(resourceFile)
    

    With that, regardless of the jar file the resourceFile is in, as long as it's on the CLASSPATH, it should be available to the application.

    I'm pretty sure any decent framework or library that uses resource files for configuration, e.g. Typesafe Config, accepts InputStream as the way to read resource files.


    You could also include the --files as part of a jar file that is part of the CLASSPATH of the executors, but that'd be obviously less flexible (as every time you'd like to submit your Spark app with a different file, you'd have to recreate the jar).

提交回复
热议问题