I`d like to access a file from my udf call. This is my script:
files = LOAD \'$docs_in\' USING PigStorage(\';\') AS (id, stopwords, id2, file);
buzz = FOREACH fi
Inside an EvalFunc you can get a file from the HDFS via:
FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
in = fs.open(new Path(fileName));
BufferedReader br = new BufferedReader(new InputStreamReader(in));
....
You might also consider putting the files into the distributed cache, in that case you have to override getCacheFiles() in your EvalFunc class.
E.g:
@Override
public List getCacheFiles() {
List list = new ArrayList(2);
list.add("/cache/pig/wordlist1.txt#w1");
list.add("/cache/pig/wordlist2.txt#w2");
return list;
}
then you can just pass the symlinks of the files (w1 and w2) in order to get them from the local file system of each of the worker nodes:
BufferedReader br = new BufferedReader(new FileReader(fileName));