I have master.txt which has 10K records, so each line of it will be a tuple & whole of the same needs to be passed to python UDF. Since it has multiple records, so on st
This can be done by adding a dummy column and then grouping.
dummmy= foreach p2preportmap generate 1, $0,$1 ....
grouped = group dummy by $0
Let me give you a example I have two relation A and B
A
1,2,3
3,4,5
4,5,6
B
1
2
3
1
2
3
1
2
3
Now i want a python udf that would lookup the first column of the A print output something like this below.
((1,{(1,2,3)}))
((2,))
((3,{(3,4,5)}))
((1,{(1,2,3)}))
((2,))
((3,{(3,4,5)}))
((1,{(1,2,3)}))
((2,))
((3,{(3,4,5)}))
So first i group A by first column and then group it by 1 so that i have single row
c = group A by $0
e = group c by 1
python udf is something like below
def pythonudf(value,map):
print map
temp = None
for a in map:
if a[0] == value:
temp = a[1]
return value,temp
now you use this udf
D = foreach B generate myudf.pythonudf($0,e.$1);