dplyr summarise_each with na.rm

后端未结

关注

 5  1789

Is there a way to instruct dplyr to use summarise_each with na.rm=TRUE? I would like to take the mean of variables with summaris


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤街浪徒        
                
              
                            
                2020-12-12 19:22
              
            
            
                                                                       
Take for instance mtcars data set

library(dplyr)


You can always use summarise to avoid long syntax:

mtcars %>%
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg, na.rm=T),
            sd_mpg = sd(mpg, na.rm = T))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2020-12-12 19:28
              
            
            
                                                                       
summarise_each is deprecated now, here an option with summarise_all.  


One can still specify na.rm = TRUE within the funs argument (cf @flodel 's answer: just replace summarise_each  with summarise_all ).  
But you can also add na.rm = TRUE after the funs argument. 


That is useful when you want to call more than only one function, e.g.:

edit

the funs() argument is now (soft)deprecated, thanks to comment @Mikko. One can use the suggestions that are given by the warning, see below in the code. na.rm can still be specified as additional argument within summarise_all. 

I used ggplot2::msleep because it contains NAs and shows this better. 

library(dplyr)

ggplot2::msleep %>% 
  select(vore, sleep_total, sleep_rem) %>%
  group_by(vore) %>%
  summarise_all(funs(mean, max, sd), na.rm = TRUE)
#> Warning: funs() is soft deprecated as of dplyr 0.8.0
#> Please use a list of either functions or lambdas: 
#> 
#>   # Simple named list: 
#>   list(mean = mean, median = median)
#> 
#>   # Auto named with `tibble::lst()`: 
#>   tibble::lst(mean, median)
#> 
#>   # Using lambdas
#>   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

### here using a named list
ggplot2::msleep %>% 
  select(vore, sleep_total, sleep_rem) %>%
  group_by(vore) %>%
  summarise_all(list(mean = mean, max = max, sd = sd), na.rm = TRUE)
#> # A tibble: 5 x 7
#>   vore  sleep_total_mean sleep_rem_mean sleep_total_max sleep_rem_max
#>   <chr>            <dbl>          <dbl>           <dbl>         <dbl>
#> 1 carni            10.4            2.29            19.4           6.6
#> 2 herbi             9.51           1.37            16.6           3.4
#> 3 inse~            14.9            3.52            19.9           6.1
#> 4 omni             10.9            1.96            18             4.9
#> 5 <NA>             10.2            1.88            13.7           2.7
#> # ... with 2 more variables: sleep_total_sd <dbl>, sleep_rem_sd <dbl>


^{Created on 2020-01-08 by the reprex package (v0.3.0)}
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2020-12-12 19:38
              
            
            
                                                                       
summarise_at function in dplyr will summarise a dataset at specific column and allow to remove NAs for each functions applied. Take iris dataset and compute mean and median for variables from Sepal.Length to Petal.Width.
library(dplyr)
summarise_at(iris,vars(Sepal.Length:Petal.Width),funs(mean,median),na.rm=T)


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-12-12 19:40
              
            
            
                                                                       
Following the links in the doc, it seems you can use funs(mean(., na.rm = TRUE)):

library(dplyr)
by_species <- iris %>% group_by(Species)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-12-12 19:43
              
            
            
                                                                       
I don't know if my answer will add something to the previous comments. Hopefully yes.

In my case, I had a database from an experiment with two groups (control, exp) with different levels for a specific variable (day) and I wanted to get a summary of mean and sd of another variable (weight) for each group for specific levels of the variable day.

Here is an example of my database:


animal    group           day       weight      
1.1       "control"       73        NA   
1.2       "control"       73        NA   
3.1       "control"       73        NA   
9.2       "control"       73        25.2  
9.3       "control"       73        23.4  
9.4       "control"       73        25.8   
2.1       "exp"           73        NA       
2.2       "exp"           73        NA     
10.1      "exp"           73        24.4     
10.2      "exp"           73        NA     
10.3      "exp"           73        24.6



So, for instance, in this case I wanted to get the mean and sd of the weight on day 73 for each of the groups (control, exp), omitting the NAs.

I did this with this command:

data[data$day=="73",] %>% group_by(group) %>% summarise(mean(weight[group == "exp"], na.rm=T),sd(weight[group == "exp"], na.rm=T))
data[data$day=="73",] %>% group_by(group) %>% summarise(mean(weight[group == "control"], na.rm=T),sd(weight[group == "control"], na.rm=T))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复