I am looking how to copy a folder with files of resource dependencies from HDFS to a local working directory of each spark executor using Java.
I was at first thinking of using --files FILES option of spark-submit but it seems it does not support folders of files of arbitrary nesting. So, it appears I have to do it via putting this folder on a shared HDFS path to be copied correctly by each executor to its working directory before running a job but yet to find out how to do it correctly in Java code.
Or zip/gzip/archive this folder, put it on shared HDFS path, and then explode the archive to local working directory of each Spark executor.
Any help or code samples is appreciated.
This is a folder of config files and they are a part of compute and should be co-located with spark-submit main jar (eg database files, which jar code is using when running a job and I unfortunately can not change this dependency as I am reusing existing code).
Regards, -Yuriy
来源:https://stackoverflow.com/questions/46515032/copy-files-config-from-hdfs-to-local-working-directory-of-every-spark-executor