I have a set set of records that I am loading from a file and the first thing I need to do is get the max and min of a column. In SQL I would do this with a subquery like this:
As you said you need to group all the data together but no extra column is required if you use GROUP ALL.
Pig
records = LOAD 'states.txt' AS (state:chararray, population:int);
records_group = GROUP records ALL;
with_max = FOREACH records_group
GENERATE
FLATTEN(records.(state, population)), MAX(records.population);
Input
CA 10
VA 5
WI 2
Output
(CA,10,10)
(VA,5,10)
(WI,2,10)