I have a Spark job that reads data from a configuration file. This file is a typesafe config file.
The code that reads the config looks like that:
Config
Even though, it is a question from a year ago, I had a simmilar issue with the ConfigFactor.
To be able to read application.conf
file, you have to do two things.
--files /path/to/file/application.conf
. Note that you can read it from HDFS if you wish.--packages com.typesafe:config:version
.Since the application.conf
file will be at the same temporary directory than the main jar aplication, you can assume in your code.
Using the answer gave above (https://stackoverflow.com/a/40586476/6615465), the code for this question will be the following:
LOG4J_FULL_PATH=/log4j-path
ROOT_DIR=/application.conf-path
/opt/deploy/spark/bin/spark-submit \
--packages com.typesafe:config:1.3.2
--class com.mycompany.Main \
--master yarn \
--deploy-mode cluster \
--files "$ROOT_DIR/application.conf, $LOG4J_FULL_PATH/log4j.xml" \
--conf spark.executor.extraClassPath="-Dconfig.file=file:application.conf" \
--driver-class-path $ROOT_DIR/application.conf \
--verbose \
/opt/deploy/lal-ml.jar