This is related to the question in Pivot table with Apache Pig. I have the input data as
Id Name Value
1 Column1 Row11
1 Column2 Row12
1
The simplest way to do it without UDF is to group on Id and than in nested foreach select rows for each of the column names, then join them in the generate. See script:
inpt = load '~/rows_to_cols.txt' as (Id : chararray, Name : chararray, Value: chararray);
grp = group inpt by Id;
maps = foreach grp {
col1 = filter inpt by Name == 'Column1';
col2 = filter inpt by Name == 'Column2';
col3 = filter inpt by Name == 'Column3';
generate flatten(group) as Id, flatten(col1.Value) as Column1, flatten(col2.Value) as Column2, flatten(col3.Value) as Column3;
};
Output:
(1,Row11,Row12,Row13)
(2,Row21,Row22,Row23)
Another option would be to write a UDF which converts a bag{name, value} into a map[], than use get values by using column names as keys (Ex. vals#'Column1').
Not sure about pig, but in spark, you could do this with a one-line command
df.groupBy("Id").pivot("Name").agg(first("Value"))