How to convert fields to rows in Pig?

社会主义新天地 提交于 2019-12-06 04:28:04

问题


I want to convert fields to rows in Pig.

from input.txt

1 2 3 4 5 6 7 8 9

delimeter between fields is '\t'.

to output.txt

1 2 3 4 ... but I must not use TOKENIZER because the content of fields might be a sentence. Please help me. Many Thanks.


回答1:


I think alexeipab's answer is the right direction. Here is a simple example:

> A = load 'input.txt';
> dump A
(0,1,2,3,4,5,6,7,8,9)
> B = foreach A generate FLATTEN(TOBAG(*));
> dump B
(0)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)



回答2:


I ran into a very similar issues using Pig. What I ended up doing was writing a UDF, that would simply iterate through the tuple. For each of the fields in the tuple it would create a new tuple with the field value and add it to a databag. Here is a sample...

public DataBag exec(Tuple tuple) throws IOException {
    DataBag db = BagFactory.getInstance().newDefaultBag();
    for(int i = 0; i < tuple.size(); ++i){
        DefaultTuple dt = new DefaultTuple();
        dt.append(tuple.get(i));
        db.add(dt);
    }
    return db;
}

Obviously that does not include any error checking as it is a sample but it will help you get an idea of how to do this.

In your script you could 'FLATTEN' the results and put the single values back into individual tuples if need be.




回答3:


It looks like you want to pivot the row. There are a couple of solutions see Pivot table with Apache Pig or Splitting a tuple into multiple tuples in Pig




回答4:


Use DataFu UDF TransposeTupleToBag (http://datafu.incubator.apache.org/docs/datafu/1.1.0/datafu/pig/util/TransposeTupleToBag.html) to get a bag which contains fields from tuple transposed. Flatten the bag to get rows with (key:chararray, value:chararray) tuple. Select 'value' part from the flatten output.



来源:https://stackoverflow.com/questions/11427889/how-to-convert-fields-to-rows-in-pig

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!