Duplicated rows: select rows based on criteria and store duplicated values

前端未结

关注

 2  1391

有刺的猬 2021-01-23 19:17

I am working on a raw dataset that looks something like this:

df <- data.frame(\"ID\" = c(\"Alpha\", \"Alpha\", \"Alpha\", \"Alpha\",


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   梦毁少年i
                                             
                
                
                (楼主)
            
              
              
                2021-01-23 19:50
              

            
            
                        
Using data.table, a dcast based on rowid(ID, Year) after ordering by Val2 descending gets you there with the exception of column names. The "_1" columns are the "keep" columns, and the "_2" columns are the "del" columns.

library(data.table)
setDT(df)

setorder(df, ID, Year, -Val2)

out <- 
  dcast(df, ID + Year ~ rowid(ID, Year), value.var = c('treatment', 'Val', 'Val2'))
out
#       ID Year treatment_1 treatment_2 Val_1 Val_2 Val2_1 Val2_2
# 1: Alpha 1970           B           A     0     0   2.34   0.00
# 2: Alpha 1980           C             0    NA   1.30     NA
# 3: Alpha 1990           D             1    NA   0.00     NA
# 4:  Beta 1970           E             0    NA   0.00     NA
# 5:  Beta 1980           G           F     0     1   3.20   2.34
# 6:  Beta 1990           H             1    NA   1.30     NA


We can change the names to match yours, only difference is the del columns have a number at the end. Would be useful if there is possiblity of > 2 rows per group.

setnames(out, function(x) gsub('(.*)_1', '\\1', x))
setnames(out, function(x) gsub('(.*_\\d+)', 'del_\\1', x))
out
#       ID Year treatment del_treatment_2 Val del_Val_2 Val2 del_Val2_2
# 1: Alpha 1970         B               A   0         0 2.34       0.00
# 2: Alpha 1980         C               0        NA 1.30         NA
# 3: Alpha 1990         D               1        NA 0.00         NA
# 4:  Beta 1970         E               0        NA 0.00         NA
# 5:  Beta 1980         G               F   0         1 3.20       2.34
# 6:  Beta 1990         H               1        NA 1.30         NA

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复