问题
I want to convert fields to rows in Pig.
from input.txt
1 2 3 4 5 6 7 8 9
delimeter between fields is '\t'.
to output.txt
1 2 3 4 ... but I must not use TOKENIZER because the content of fields might be a sentence. Please help me. Many Thanks.
回答1:
I think alexeipab's answer is the right direction. Here is a simple example:
> A = load 'input.txt';
> dump A
(0,1,2,3,4,5,6,7,8,9)
> B = foreach A generate FLATTEN(TOBAG(*));
> dump B
(0)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
回答2:
I ran into a very similar issues using Pig. What I ended up doing was writing a UDF, that would simply iterate through the tuple. For each of the fields in the tuple it would create a new tuple with the field value and add it to a databag. Here is a sample...
public DataBag exec(Tuple tuple) throws IOException {
DataBag db = BagFactory.getInstance().newDefaultBag();
for(int i = 0; i < tuple.size(); ++i){
DefaultTuple dt = new DefaultTuple();
dt.append(tuple.get(i));
db.add(dt);
}
return db;
}
Obviously that does not include any error checking as it is a sample but it will help you get an idea of how to do this.
In your script you could 'FLATTEN' the results and put the single values back into individual tuples if need be.
回答3:
It looks like you want to pivot the row. There are a couple of solutions see Pivot table with Apache Pig or Splitting a tuple into multiple tuples in Pig
回答4:
Use DataFu UDF TransposeTupleToBag (http://datafu.incubator.apache.org/docs/datafu/1.1.0/datafu/pig/util/TransposeTupleToBag.html) to get a bag which contains fields from tuple transposed. Flatten the bag to get rows with (key:chararray, value:chararray) tuple. Select 'value' part from the flatten output.
来源:https://stackoverflow.com/questions/11427889/how-to-convert-fields-to-rows-in-pig