Access hdfs file from udf

前端 未结 1 958
旧时难觅i
旧时难觅i 2021-02-11 00:08

I`d like to access a file from my udf call. This is my script:

files = LOAD \'$docs_in\' USING PigStorage(\';\') AS (id, stopwords, id2, file);
buzz = FOREACH fi         


        
相关标签:
1条回答
  • 2021-02-11 00:42

    Inside an EvalFunc you can get a file from the HDFS via:

    FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
    in = fs.open(new Path(fileName));
    BufferedReader br = new BufferedReader(new InputStreamReader(in));
    ....
    

    You might also consider putting the files into the distributed cache, in that case you have to override getCacheFiles() in your EvalFunc class.

    E.g:

    @Override
    public List<String> getCacheFiles() {
      List<String> list = new ArrayList<String>(2);
      list.add("/cache/pig/wordlist1.txt#w1");
      list.add("/cache/pig/wordlist2.txt#w2");
      return list;
    }
    

    then you can just pass the symlinks of the files (w1 and w2) in order to get them from the local file system of each of the worker nodes:

    BufferedReader br = new BufferedReader(new FileReader(fileName));
    
    0 讨论(0)
提交回复
热议问题