Accessing files in hadoop distributed cache

前端 未结 4 2035
夕颜
夕颜 2021-02-06 04:42

I want to use the distributed cache to allow my mappers to access data. In main, I\'m using the command

DistributedCache.addCacheFile(new URI(\"/user/peter/cac         


        
相关标签:
4条回答
  • 2021-02-06 05:24

    Problem here was that I was doing the following:

    Configuration conf = new Configuration();
    Job job = new Job(conf, "wordcount");
    DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
    

    Since the Job constructor makes an internal copy of the conf instance, adding the cache file afterwards doesn't affect things. Instead, I should do this:

    Configuration conf = new Configuration();
    DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
    Job job = new Job(conf, "wordcount");
    

    And now it works. Thanks to Harsh on hadoop user list for the help.

    0 讨论(0)
  • 2021-02-06 05:31
    Configuration conf = new Configuration();  
    Job job = new Job(conf, "wordcount");
    DistributedCache.addCacheFile(new URI("/userpetercacheFiletestCache1"),job.getConfiguration());
    

    You can also do it in this way.

    0 讨论(0)
  • 2021-02-06 05:36

    This version of code ( which is slightly different from the above mentioned constructs) has always worked for me.

    //in main(String [] args)
    Job job = new Job(conf,"Word Count"); 
    ...
    DistributedCache.addCacheFile(new URI(/user/peter/cacheFile/testCache1), job.getConfiguration());
    

    I didnt see the complete setup() function in Mapper code

    public void setup(Context context) throws IOException, InterruptedException {
    
        Configuration conf = context.getConfiguration();
        FileSystem fs = FileSystem.getLocal(conf);
    
        Path[] dataFile = DistributedCache.getLocalCacheFiles(conf);
    
        // [0] because we added just one file.
        BufferedReader cacheReader = new BufferedReader(new InputStreamReader(fs.open(dataFile[0])));
        // now one can use BufferedReader's readLine() to read data
    
    }
    
    0 讨论(0)
  • 2021-02-06 05:43

    Once the Job is assigned to with a configuration object, ie Configuration conf = new Configuration();

    Job job = new Job(conf, "wordcount");
    

    And then if deal with attributes of conf as shown below, eg

    conf.set("demiliter","|");
    

    or

    DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
    

    Such changes would not be reflected in a pseudo cluster or cluster how ever it would work with local environment.

    0 讨论(0)
提交回复
热议问题