Splitting a tuple into multiple tuples in Pig

前端 未结 3 1572
醉话见心
醉话见心 2020-12-25 10:26

I like to generate multiple tuples from a single tuple. What I mean is: I have file with following data in it.

>> cat data
ID | ColumnName1:Value1 | C         


        
3条回答
  •  囚心锁ツ
    2020-12-25 10:45

    You could write a UDF or use a PIG script with built-in functions.

    For example:

    -- data should be chararray, PigStorage('|') return bytearray which will not work for this example
    inpt = load '/pig_fun/input/single_tuple_to_multiple.txt' as (line:chararray);
    
    -- split by | and create a row so we can dereference it later
    splt = foreach inpt generate FLATTEN(STRSPLIT($0, '\\|')) ;
    
    -- first column is id, rest is converted into a bag and flatten it to make rows
    id_vals = foreach splt generate $0 as id, FLATTEN(TOBAG(*)) as value;
    -- there will be records with (id, id), but id should not have ':'
    id_vals = foreach id_vals generate id, INDEXOF(value, ':') as p, STRSPLIT(value, ':', 2) as vals;
    final = foreach (filter id_vals by p != -1) generate id, FLATTEN(vals) as (col, val);
    dump final;
    

    Test INPUT:

    1|c1:11:33|c2:12
    234|c1:21|c2:22
    33|c1:31|c2:32
    345|c1:41|c2:42
    

    OUTPUT

    (1,c1,11:33)
    (1,c2,12)
    (234,c1,21)
    (234,c2,22)
    (33,c1,31)
    (33,c2,32)
    (345,c1,41)
    (345,c2,42)
    

    I hope it helps.

    Cheers.

提交回复
热议问题