Group key value of map in pig

前端 未结 1 915
闹比i
闹比i 2021-01-18 09:50

I am new to pigscript. Say, We have a file

[a#1,b#2,c#3]
[a#4,b#5,c#6]
[a#7,b#8,c#9]

pig script

A = LOAD \'txt\' AS (in: ma         


        
相关标签:
1条回答
  • 2021-01-18 10:55

    You can create a custom UDF which converts the map to a bag (using Pig v0.10.0):

    package com.example;
    
    import java.io.IOException;
    import java.util.Map;
    import java.util.Map.Entry;
    
    import org.apache.pig.EvalFunc;
    import org.apache.pig.data.BagFactory;
    import org.apache.pig.data.DataBag;
    import org.apache.pig.data.Tuple;
    import org.apache.pig.data.TupleFactory;
    
    public class MapToBag extends EvalFunc<DataBag> {
    
        private static final BagFactory bagFactory = BagFactory.getInstance();
        private static final TupleFactory tupleFactory = TupleFactory.getInstance();
    
        @Override
        public DataBag exec(Tuple input) throws IOException {
            try {
                @SuppressWarnings("unchecked")
                Map<String, Object> map = (Map<String, Object>) input.get(0);
                DataBag result = null;
                if (map != null) {
                    result = bagFactory.newDefaultBag();
                    for (Entry<String, Object> entry : map.entrySet()) {
                        Tuple tuple = tupleFactory.newTuple(2);
                        tuple.set(0, entry.getKey());
                        tuple.set(1, entry.getValue());
                        result.add(tuple);
                    }
                }
                return result;
    
            }
            catch (Exception e) {
                throw new RuntimeException("MapToBag error", e);
            }
        }
    }
    

    Then:

    B = foreach A generate 
          flatten(com.example.MapToBag(in)) as (k:chararray, v:chararray);
    describe B;
    B: {k: chararray,v: chararray}
    

    Now group by key and use a nested foreach:

    C = foreach (group B by k) {
        value = foreach B generate v;
        generate group as key, value;
    };
    dump C;
    (a,{(1),(4),(7)})
    (b,{(2),(5),(8)})
    (c,{(3),(6),(9)})
    
    0 讨论(0)
提交回复
热议问题