Pivoting in Pig

后端未结

关注

 2  1910

This is related to the question in Pivot table with Apache Pig. I have the input data as

Id    Name     Value 
1     Column1  Row11 
1     Column2  Row12 
1


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  无人共我        
                
              
                            
                2020-12-06 04:02
              
            
            
                                                                       
The simplest way to do it without UDF is to group on Id and than in nested foreach select rows for each of the column names, then join them in the generate. See script:

inpt = load '~/rows_to_cols.txt' as (Id : chararray, Name : chararray, Value: chararray);
grp = group inpt by Id;
maps = foreach grp {
    col1 = filter inpt by Name == 'Column1';
    col2 = filter inpt by Name == 'Column2';
    col3 = filter inpt by Name == 'Column3';
    generate flatten(group) as Id, flatten(col1.Value) as Column1, flatten(col2.Value)  as Column2, flatten(col3.Value)  as Column3;
};


Output:

(1,Row11,Row12,Row13)
(2,Row21,Row22,Row23)


Another option would be to write a UDF which converts a bag{name, value} into a map[], than use get values by using column names as keys (Ex. vals#'Column1').
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2020-12-06 04:24
              
            
            
                                                                       
Not sure about pig, but in spark, you could do this with a one-line command

df.groupBy("Id").pivot("Name").agg(first("Value"))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复