问题
We're building a Spark application in Scala with a HOCON configuration, the config is called application.conf
.
If I add the application.conf
to my jar file and start a job on Google Dataproc, it works correctly:
gcloud dataproc jobs submit spark \
--cluster <clustername> \
--jar=gs://<bucketname>/<filename>.jar \
--region=<myregion> \
-- \
<some options>
I don't want to bundle the application.conf
with my jar file but provide it separately, which I can't get working.
Tried different things, i.e.
- Specifying the application.conf with
--jars=gs://<bucketname>/application.conf
(which should work according to this answer) - Using
--files=gs://<bucketname>/application.conf
- Same as 1. + 2. with the application conf in
/tmp/
on the Master instance of the cluster, then specifying the local file withfile:///tmp/application.conf
- Defining
extraClassPath
for spark using--properties=spark.driver.extraClassPath=gs://<bucketname>/application.conf
(and for executors)
With all these options I get an error, it can't find the key in the config:
Exception in thread "main" com.typesafe.config.ConfigException$Missing: system properties: No configuration setting found for key 'xyz'
This error usually means that there's an error in the HOCON config (key xyz
is not defined in HOCON) or that the application.conf
is not in the classpath. Since the exact same config is working when inside my jar file, I assume it's the latter.
Are there any other options to put the application.conf
on the classpath?
回答1:
If --jars
doesn't work as suggested in this answer, you can try init action. First upload your config to GCS, then write an init action to download it to the VMs, putting it to a folder in the classpath or update spark-env.sh to include the path to the config.
来源:https://stackoverflow.com/questions/58238269/add-conf-file-to-classpath-in-google-dataproc