How to flush Hadoop Distributed Cache?

孤街浪徒 提交于 2021-01-04 17:01:47

问题


I have added a set of jars to the Distributed Cache using the DistributedCache.addFileToClassPath(Path file, Configuration conf) method to make the dependencies available to a map reduce job across the cluster. Now I would like to remove all those jars from the cache to start clean and be sure I have the right jar versions there. I commented out the code that adds the files to the cache and also removed them from where I had copied them in hdfs. The problem is the jars still appear to be in the classpath because the map reduce job is not throwing ClassNotFound exceptions. Is there a way to flush this cache without restarting any services?

Edit: Subsequently I flushed the following folder: /var/lib/hadoop-hdfs/cache/mapred/mapred/local/taskTracker/distcache/ . That did not solve it. The job still finds the references.


回答1:


I now understand what my problem was. I had previously copied the jars into the /usr/lib/hadoop/lib/ folder. That made them permanently available to the map reduce job. After removing them from there, the job threw the expected ClassNotFoundException. Also, I noticed that if I do not add the jars with addFileToClassPath they are not available to the job. So there is no need to flush the Distributed Cache or to remove what you have added with addFileToClassPath because what you put there is visible only to that specify job instance.



来源:https://stackoverflow.com/questions/14607018/how-to-flush-hadoop-distributed-cache

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!