How to merge rows in OpenRefine

后端未结

关注

 5  1268

How to merge rows based on some ID field?

Original Table                   New Table

ID   | Field1 | Field2       ID     | Field1 | Field2
-----|------- |------


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2021-01-26 10:23
              
            
            
                                                                       
It's not OpenRefine but I think it's a really good tool for a OpenRefine user. You could run this Miller (https://github.com/johnkerl/miller) command

mlr --csv reshape -r "Field" -o item,value \
then filter -x -S '$value==""' \
then reshape -s item,value input.csv


to have

ID,Field1,Field2
A,5,10
B,1,3
C,4,150


First I create a tidy version of the data (https://vita.had.co.nz/papers/tidy-data.pdf), and than I transform again it from long to wide format 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2021-01-26 10:32
              
            
            
                                                                       
For "ID" column use "add column based on this column":

filter(
  cell.cross("ProjectName", "ID").cells["Field1"].value,
  v,
  isNonBlank(v)
)[0]


This will set a value for each row identified ID.

Original Table      New Table

ID   | Field1 | Field2 | Field1_ | Field2_
-----|------- |--------|---------|--------
A        5                  5        10
A                10         5        10
B        1                  1        3
B                3          1        3
C        4                  4        150
C                150        4        150


Remove old columns.

After that, remove duplicates by using "blank down + facet by blank + remove matching rows" approach
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2021-01-26 10:37
              
            
            
                                                                       
In the ID column use the menu option: Edit Cells -> Blank down
This should leave you with a table looking like:

ID   | Field1 | Field2 
-----|------- |--------
A        5             
                 10    
B        1             
                 3
C        4
                 150


Make sure you are in "Records" mode (this option is at the top left of the data grid). You should see the rows for each ID are grouped together.

Now use Edit Cells -> Join multi-valued cells on each of the other columns - this should leave you with a single row per record once you have done this for all columns
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2021-01-26 10:38
              
            
            
                                                                       
With the cross() function, you can also achieve this result, but much faster.

It's basically a VLOOKUP on itself (all the rows of the same record) — skipping rows with data — that is after filtered to remove empty cells.

filter(cross(cells.ID.value, "TEST-FillUp", "ID"), vV, vV.cells["Field1"].value != null)[0].cells.Field1.value

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  余生分开走        
                
              
                            
                2021-01-26 10:44
              
            
            
                                                                       
I think a simpler solution would be to use:

1° The feature "Edit Cells / Blank Down" on your ID column, in order to get something like this:



2° Then "Edit Cells / Join Multivalued cells" on the last column only (Field2), which will produce this:


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复