Unable to pass pig tuple to python UDF

前端 未结 2 547
有刺的猬
有刺的猬 2021-01-07 15:04

I have master.txt which has 10K records, so each line of it will be a tuple & whole of the same needs to be passed to python UDF. Since it has multiple records, so on st

相关标签:
2条回答
  • 2021-01-07 15:37

    This can be done by adding a dummy column and then grouping.

    dummmy= foreach p2preportmap generate 1, $0,$1 ....

    grouped = group dummy by $0

    0 讨论(0)
  • 2021-01-07 15:38

    Let me give you a example I have two relation A and B

    A

    1,2,3
    3,4,5
    4,5,6
    

    B

    1
    2
    3
    1
    2
    3
    1
    2
    3
    

    Now i want a python udf that would lookup the first column of the A print output something like this below.

        ((1,{(1,2,3)}))
    ((2,))
    ((3,{(3,4,5)}))
    ((1,{(1,2,3)}))
    ((2,))
    ((3,{(3,4,5)}))
    ((1,{(1,2,3)}))
    ((2,))
    ((3,{(3,4,5)}))
    

    So first i group A by first column and then group it by 1 so that i have single row

    c = group A by $0
    e = group c by 1
    

    python udf is something like below

    def pythonudf(value,map):
        print map
        temp = None
        for a in map:
            if a[0] == value:
                temp = a[1]
        return value,temp
    

    now you use this udf

    D = foreach B generate myudf.pythonudf($0,e.$1);
    
    0 讨论(0)
提交回复
热议问题