Max/Min for whole sets of records in PIG

后端 未结 1 1855
时光取名叫无心
时光取名叫无心 2021-02-08 23:58

I have a set set of records that I am loading from a file and the first thing I need to do is get the max and min of a column. In SQL I would do this with a subquery like this:

1条回答
  •  旧巷少年郎
    2021-02-09 00:41

    As you said you need to group all the data together but no extra column is required if you use GROUP ALL.

    Pig

    records = LOAD 'states.txt'  AS (state:chararray, population:int);
    records_group = GROUP records ALL;
    with_max = FOREACH records_group 
               GENERATE
                   FLATTEN(records.(state, population)), MAX(records.population);
    

    Input

    CA  10
    VA  5
    WI  2
    

    Output

    (CA,10,10)
    (VA,5,10)
    (WI,2,10)
    

    0 讨论(0)
提交回复
热议问题