Accessing files in hadoop distributed cache

前端 未结 4 2036
夕颜
夕颜 2021-02-06 04:42

I want to use the distributed cache to allow my mappers to access data. In main, I\'m using the command

DistributedCache.addCacheFile(new URI(\"/user/peter/cac         


        
4条回答
  •  生来不讨喜
    2021-02-06 05:36

    This version of code ( which is slightly different from the above mentioned constructs) has always worked for me.

    //in main(String [] args)
    Job job = new Job(conf,"Word Count"); 
    ...
    DistributedCache.addCacheFile(new URI(/user/peter/cacheFile/testCache1), job.getConfiguration());
    

    I didnt see the complete setup() function in Mapper code

    public void setup(Context context) throws IOException, InterruptedException {
    
        Configuration conf = context.getConfiguration();
        FileSystem fs = FileSystem.getLocal(conf);
    
        Path[] dataFile = DistributedCache.getLocalCacheFiles(conf);
    
        // [0] because we added just one file.
        BufferedReader cacheReader = new BufferedReader(new InputStreamReader(fs.open(dataFile[0])));
        // now one can use BufferedReader's readLine() to read data
    
    }
    

提交回复
热议问题