Filter data.frame rows by a logical condition

前端未结

关注

 9  1387

I want to filter rows from a data.frame based on a logical condition. Let\'s suppose that I have data frame like

   expr_value     cell_type
1


                      
              相关标签:


      
      
        
          9条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2020-11-21 05:48
              
            
            
                                                                       
To select rows according to one 'cell_type' (e.g. 'hesc'), use ==:

expr[expr$cell_type == "hesc", ]


To select rows according to two or more different 'cell_type', (e.g. either 'hesc' or 'bj fibroblast'), use %in%:

expr[expr$cell_type %in% c("hesc", "bj fibroblast"), ]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2020-11-21 05:49
              
            
            
                                                                       
we can use data.table library

  library(data.table)
  expr <- data.table(expr)
  expr[cell_type == "hesc"]
  expr[cell_type %in% c("hesc","fibroblast")]


or filter using %like% operator for pattern matching

 expr[cell_type %like% "hesc"|cell_type %like% "fibroblast"]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2020-11-21 05:54
              
            
            
                                                                       
Use subset (for interactive use)

subset(expr, cell_type == "hesc")
subset(expr, cell_type %in% c("bj fibroblast", "hesc"))


or better dplyr::filter()

filter(expr, cell_type %in% c("bj fibroblast", "hesc"))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  庸人自扰        
                
              
                            
                2020-11-21 05:54
              
            
            
                                                                       
You could use the dplyr package:

library(dplyr)
filter(expr, cell_type == "hesc")
filter(expr, cell_type == "hesc" | cell_type == "bj fibroblast")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2020-11-21 06:00
              
            
            
                                                                       
No one seems to have included the which function. It can also prove useful for filtering. 

expr[which(expr$cell == 'hesc'),]


This will also handle NAs and drop them from the resulting dataframe.

Running this on a 9840 by 24 dataframe 50000 times, it seems like the which method has a 60% faster run time than the %in% method. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2020-11-21 06:01
              
            
            
                                                                       
Sometimes the column you want to filter may appear in a different position than column index 2 or have a variable name. 

In this case, you can simply refer the column name you want to filter as:

columnNameToFilter = "cell_type"
expr[expr[[columnNameToFilter]] == "hesc", ]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复