R how can I calculate difference between rows in a data frame

前端未结

关注

 6  2019

Here is a simple example of my problem:

> df <- data.frame(ID=1:10,Score=4*10:1)
> df
       ID Score
    1   1    40
    2   2    36
    3   3    3


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2020-12-03 01:35
              
            
            
                                                                       
Perhaps you are looking for something like this:

> tail(df, -1) - head(df, -1)
   ID Score
2   1    -4
3   1    -4
4   1    -4
5   1    -4
6   1    -4
7   1    -4
8   1    -4
9   1    -4
10  1    -4


You can subtract or add two data.frames together if they are the same dimensions. So, what we are doing here is subtracting one data.frame that is missing the first row (tail(df, -1)) and one that is missing the last row (head(df, -1)) and subtracting them.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2020-12-03 01:38
              
            
            
                                                                       
Another option using dplyr would be using mutate_each to loop through all the columns, get the difference of the column (.) with the lag of the column (.) and remove the NA element at the top with na.omit()

library(dplyr)
df %>%
    mutate_each(funs(. - lag(.))) %>%
    na.omit() 


EDIT:

Instead of mutate_each (deprecated - as mentioned by @PatrickT) use mutate_all

df %>%
    mutate_all(funs(. - lag(.))) %>%
    na.omit() 




Or with shift from data.table.  Convert the 'data.frame' to 'data.table' (setDT(df)), loop through the columns (lapply(.SD, ..)) and get the difference between the column (x) and thelag(shiftby default gives thelagastype = "lag"`).  Remove the first observation i.e. NA element.

library(data.table)
setDT(df)[, lapply(.SD, function(x) (x- shift(x))[-1])]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2020-12-03 01:40
              
            
            
                                                                       
Adding this a few years later for completeness- you can use a simple  [.data.frame subseting in order to achieve this too

df[-1, ] - df[-nrow(df), ]
#    ID Score
# 2   1    -4
# 3   1    -4
# 4   1    -4
# 5   1    -4
# 6   1    -4
# 7   1    -4
# 8   1    -4
# 9   1    -4
# 10  1    -4

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-03 01:42
              
            
            
                                                                       
Because df works on  vector or matrix. You can use apply to apply the function across columns like so:

 apply( df , 2 , diff )
   ID Score
2   1    -4
3   1    -4
4   1    -4
5   1    -4
6   1    -4
7   1    -4
8   1    -4
9   1    -4
10  1    -4


It seems unlikely that you want to calculate the difference in sequential IDs, so you could choose to apply it on all columns  except the first like so:

apply( df[-1] , 2 , diff )


Or you could use data.table (not that it adds anything here I just really want to start using it!), and I am again assuming that you do not want to apply diff to the ID column:

DT <- data.table(df)
DT[ , list(ID,Score,Diff=diff(Score))  ]
    ID Score Diff
 1:  1    40   -4
 2:  2    36   -4
 3:  3    32   -4
 4:  4    28   -4
 5:  5    24   -4
 6:  6    20   -4
 7:  7    16   -4
 8:  8    12   -4
 9:  9     8   -4
10: 10     4   -4


And thanks to @AnandaMahto an alternative syntax that gives more flexibility to choose which columns to run it on could be:

DT[, lapply(.SD, diff), .SDcols = 1:2]


Here .SDcols = 1:2 means you want to apply the diff function to columns 1 and 2. If you have 20 columns and didn't want to apply it to ID you could use .SDcols=2:20 as an example.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2020-12-03 01:44
              
            
            
                                                                       
I would like to show an alternative way for doing such kind of things even often I have the feeling it is not appreciated doing this in that way: using sql.

sqldf(paste("SELECT a.ID,a.Score"
            ,"      , a.Score - (SELECT b.Score"
            ,"                   FROM df b"
            ,"                   WHERE b.ID < a.ID"
            ,"                   ORDER BY b.ID DESC"
            ,"                   ) diff"
            ," FROM df a"
            )
      )


The code seems complicated but it is not and it has some advantage, as you can see at the results:

    ID Score diff
 1   1    40 <NA>
 2   2    36 -4.0
 3   3    32 -4.0
 4   4    28 -4.0
 5   5    24 -4.0
 6   6    20 -4.0
 7   7    16 -4.0
 8   8    12 -4.0
 9   9     8 -4.0
 10 10     4 -4.0


One advantage is that you use the original dataframe (without converting into other classes) and you get a data frame (put it in res <- ....). Another advantage is that you have still all rows. And the third advantage is that you can easily consider grouping factors. For example:

df2 <- data.frame(ID=1:10,grp=rep(c("v","w"), each=5),Score=4*10:1)

sqldf(paste("SELECT a.ID,a.grp,a.Score"
            ,"      , a.Score - (SELECT b.Score"
            ,"                   FROM df2 b"
            ,"                   WHERE b.ID < a.ID"
            ,"                         AND a.grp = b.grp"
            ,"                   ORDER BY b.ID DESC"
            ,"                   ) diff"
     ," FROM df2 a"
     )
)


   ID grp Score diff
1   1   v    40 <NA>
2   2   v    36 -4.0
3   3   v    32 -4.0
4   4   v    28 -4.0
5   5   v    24 -4.0
6   6   w    20 <NA>
7   7   w    16 -4.0
8   8   w    12 -4.0
9   9   w     8 -4.0
10 10   w     4 -4.0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2020-12-03 01:45
              
            
            
                                                                       
diff wants a matrix or a vector rather than a data frame. Try

data.frame(diff(as.matrix(df)))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复