Access hdfs file from udf

前端 未结 1 945
旧时难觅i
旧时难觅i 2021-02-11 00:08

I`d like to access a file from my udf call. This is my script:

files = LOAD \'$docs_in\' USING PigStorage(\';\') AS (id, stopwords, id2, file);
buzz = FOREACH fi         


        
1条回答
  •  遇见更好的自我
    2021-02-11 00:42

    Inside an EvalFunc you can get a file from the HDFS via:

    FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
    in = fs.open(new Path(fileName));
    BufferedReader br = new BufferedReader(new InputStreamReader(in));
    ....
    

    You might also consider putting the files into the distributed cache, in that case you have to override getCacheFiles() in your EvalFunc class.

    E.g:

    @Override
    public List getCacheFiles() {
      List list = new ArrayList(2);
      list.add("/cache/pig/wordlist1.txt#w1");
      list.add("/cache/pig/wordlist2.txt#w2");
      return list;
    }
    

    then you can just pass the symlinks of the files (w1 and w2) in order to get them from the local file system of each of the worker nodes:

    BufferedReader br = new BufferedReader(new FileReader(fileName));
    

    0 讨论(0)
提交回复
热议问题