Accessing files in hadoop distributed cache

前端 未结 4 2044
夕颜
夕颜 2021-02-06 04:42

I want to use the distributed cache to allow my mappers to access data. In main, I\'m using the command

DistributedCache.addCacheFile(new URI(\"/user/peter/cac         


        
4条回答
  •  庸人自扰
    2021-02-06 05:24

    Problem here was that I was doing the following:

    Configuration conf = new Configuration();
    Job job = new Job(conf, "wordcount");
    DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
    

    Since the Job constructor makes an internal copy of the conf instance, adding the cache file afterwards doesn't affect things. Instead, I should do this:

    Configuration conf = new Configuration();
    DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
    Job job = new Job(conf, "wordcount");
    

    And now it works. Thanks to Harsh on hadoop user list for the help.

提交回复
热议问题