How to remove rows where all columns are zero using dplyr pipe

前端未结

关注

 5  669

I have the following data frame:

dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129, 
0), `fBM-


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2021-01-18 08:24
              
            
            
                                                                       
Here is a third option that uses purrr::pmap to generate the indices of whether or not all rows are zero. Definitely less compact than filter_at, but opens up options for interesting and complex conditions using pmap!



dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129, 
                                  0), `fBM-XXX` = c(2.22875185527511, 0, 0), `P-XXX` = c(1.73356698481106, 
                                                                                         0, 0), `vBM-XXX` = c(3.00397859609183, 0, 0)), .Names = c("A-XXX", 
                                                                                                                                                   "fBM-XXX", "P-XXX", "vBM-XXX"), row.names = c("BATF::JUN_AHR", 
                                                                                                                                                                                                 "BATF::JUN_CCR9", "BATF::JUN_IL10"), class = "data.frame")

library(tidyverse)
dat %>%
  rownames_to_column() %>%
  bind_cols(all_zero = pmap_lgl(., function(rowname, ...) all(list(...) == 0))) %>%
  filter(all_zero == FALSE) %>%
  `rownames<-`(.$rowname) %>%
  select(-rowname, -all_zero)
#>                     A-XXX  fBM-XXX    P-XXX  vBM-XXX
#> BATF::JUN_AHR  1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000


Created on 2018-03-14 by the reprex package (v0.2.0).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2021-01-18 08:25
              
            
            
                                                                       
Adding to the answer by @mgrund,
a shorter alternative with dplyr 1.0.0 is:
# Option A:
data %>% filter(across(everything(.)) != 0))

# Option B:
data %>% filter(across(everything(.), ~. == 0))

Explanation: 

across() checks for every tidy_select variable, which is everything() representing every column. In Option A, every column is checked if not zero, which adds up to a complete row of zeros in every column. In Option B, on every column, the formula (~) is applied which checks if the current column is zero.
EDIT:

As filter already checks by row, you don't need rowwise(). This is different for select or mutate.
IMPORTANT:

In Option A, it is crucial to write across(everything(.)) != 0, 

and NOT
across(everything(.) != 0))!
Reason: 

across requires a tidyselect variable (here everything()), not a boolean (which would be everything(.) != 0))
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2021-01-18 08:26
              
            
            
                                                                       
Here's a dplyr option:

library(dplyr)
filter_all(dat, any_vars(. != 0))

#       A-XXX  fBM-XXX    P-XXX  vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000


Here we make use of the logic that if any variable is not equal to zero, we will keep it. It's the same as removing rows where all variables are equal to zero.

Regarding row.names:

library(tidyverse)
dat %>% rownames_to_column() %>% filter_at(vars(-rowname), any_vars(. != 0))
#         rowname      A-XXX  fBM-XXX    P-XXX  vBM-XXX
#1  BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#2 BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-01-18 08:32
              
            
            
                                                                       
Here's another option using the row-wise operations of dplyr (with col1,col2,col3 defining three exemplary columns for which the rowwise sum is calculated):
library(tidyverse)

df <- df %>% 
    rowwise() %>% 
    filter(sum(c(col1,col2,col3)) != 0)

Alternatively, if you have tons of variables (columns) to select you can also use the tidyverse selection syntax via:
df <- df %>% 
    rowwise() %>% 
    filter(sum(c_across(col1:col3)) != 0)

For details see: https://dplyr.tidyverse.org/articles/rowwise.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  心在旅途        
                
              
                            
                2021-01-18 08:42
              
            
            
                                                                       
We could use reduce from purrr to get the sum of rows and filter the dataset based on the logical vector

library(tidyverse)
dat %>%
    reduce(`+`) %>%
    {. != 0} %>% 
   filter(dat, .)
#       A-XXX  fBM-XXX    P-XXX  vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000


NOTE: Within the %>%, the row.names gets stripped off.  It may be better to create a new column or assign row.names later



If we need the row names as well, then create a row names column early and then use that to change the row names at the end

dat %>%
  rownames_to_column('rn') %>%
  filter(rowSums(.[-1]) != 0) %>% 
  `row.names<-`(., .[['rn']]) %>% select(-rn)
#                   A-XXX  fBM-XXX    P-XXX  vBM-XXX
#BATF::JUN_AHR  1.51653276 2.228752 1.733567 3.003979
#BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复