Copy files (config) from HDFS to local working directory of every spark executor

坚强是说给别人听的谎言 提交于 2019-12-19 11:25:29

问题


I am looking how to copy a folder with files of resource dependencies from HDFS to a local working directory of each spark executor using Java.

I was at first thinking of using --files FILES option of spark-submit but it seems it does not support folders of files of arbitrary nesting. So, it appears I have to do it via putting this folder on a shared HDFS path to be copied correctly by each executor to its working directory before running a job but yet to find out how to do it correctly in Java code.

Or zip/gzip/archive this folder, put it on shared HDFS path, and then explode the archive to local working directory of each Spark executor.

Any help or code samples is appreciated.

This is a folder of config files and they are a part of compute and should be co-located with spark-submit main jar (eg database files, which jar code is using when running a job and I unfortunately can not change this dependency as I am reusing existing code).

Regards, -Yuriy

来源:https://stackoverflow.com/questions/46515032/copy-files-config-from-hdfs-to-local-working-directory-of-every-spark-executor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!