问题
I have recently found a way to use logback instead of log4j in Apache Spark (both for local use and spark-submit
). However, there is last piece missing.
The issue is that Spark tries very hard not to see logback.xml
settings in its classpath. I have already found a way to load it during local execution:
What I have so far
Basically, checking for System property logback.configurationFile
, but loading logback.xml
from my /src/main/resources/
just in case:
// the same as default: https://logback.qos.ch/manual/configuration.html
private val LogbackLocation = Option(System.getProperty("logback.configurationFile"))
// add some default logback.xml to your /src/main/resources
private lazy val defaultLogbackConf = getClass.getResource("/logback.xml").getPath
private def getLogbackConfigPath = {
val path = LogbackLocation.map(new File(_).getPath).getOrElse(defaultLogbackConf)
logger.info(s"Loading logging configuration from: $path")
path
}
And then when I initialize my SparkContext...
val sc = SparkContext.getOrCreate(conf)
sc.addFile(getLogbackConfigPath)
I can confirm it works locally.
Playing with spark-submit
spark-submit \
...
--master yarn \
--class com.company.Main\
/path/to/my/application-fat.jar \
param1 param2
This gives an error:
Exception in thread "main" java.io.FileNotFoundException: Added file file:/path/to/my/application-fat.jar!/logback.xml does not exist
Which I think is nonsense, because first the application, finds the file (according to my code)
getClass.getResource("/logback.xml").getPath
and then, during
sc.addFile(getLogbackConfigPath)
it turns out... whoa! no file there!? What the heck!? Why would it not find the file inside the jar. It obviously is there, I did triple checked it.
Another approach to spark-submit
So I thought, OK. I will pass my file, as I could specify the system property. I put the logback.xml
file next to my application-fat.jar
and:
spark-submit \
...
--conf spark.driver.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
--conf spark.executor.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
--master yarn \
--class com.company.Main\
/path/to/my/application-fat.jar \
param1 param2
And I get the same error as above. So my setting is completely ignored! Why? How to specify
-Dlogback.configurationFile
properly and pass it as properly to driver and executors?
Thanks!
回答1:
1. Solving java.io.FileNotFoundException
This is probably unsolvable.
Simply, SparkContext.addFile
can not read the file from inside the Jar. I believe it is treated as it was in some zip
or alike.
Fine.
2. Passing -Dlogback.configurationFile
This was not working due to my misunderstanding of the configuration parameters.
Because I am using --master yarn
parameter, but I do not specify --deploy-mode
to cluster
it is by default client
.
Reading https://spark.apache.org/docs/1.6.1/configuration.html#application-properties
spark.driver.extraJavaOptions
Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.
So passing this setting with --driver-java-options
worked:
spark-submit \
...
--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
--master yarn \
--class com.company.Main\
/path/to/my/application-fat.jar \
param1 param2
Note about --driver-java-options
In contrast to --conf
multiple parameters have to be passed as one parameter, example:
--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml -Dother.setting=value" \
And the following will not work
--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
--driver-java-options "-Dother.setting=value" \
来源:https://stackoverflow.com/questions/45490778/pass-system-property-to-spark-submit-and-read-file-from-classpath-or-custom-path