Faster %in% operator

前端未结

关注

 2  628

The fastmatch package implements a much faster version of match for repeated matches (e.g. in a loop):

set.seed(1)
library(fastmatch)
table <- 1L


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  悲&欢浪女        
                
              
                            
                2021-02-07 01:35
              
            
            
                                                                       
match is almost always better done by putting both vectors in dataframes and merging (see various joins from dplyr)

For example, something like this would give you all the info you need:

library(dplyr)

data = data_frame(data.ID = 1L:100000L,
                  data.extra = 1:2)

sample = 
  data %>% 
  sample_n(10000, replace=TRUE) %>%
  mutate(sample.ID = 1:n(),
         sample.extra = 3:4 )

# join table not strictly necessary in this case
# but necessary in many-to-many matches
data__sample = inner_join(data, sample)

#check whether a data.ID made it into sample
data__sample %>% filter(data.ID == 1)


or left_join, right_join, full_join, semi_join, anti_join, depending on what info is most useful to you
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一个人的身影        
                
              
                            
                2021-02-07 01:48
              
            
            
                                                                       
Look at the definition of %in%:

R> `%in%`
function (x, table) 
match(x, table, nomatch = 0L) > 0L
<bytecode: 0x1fab7a8>
<environment: namespace:base>


It's easy to write your own %fin% function:

`%fin%` <- function(x, table) {
  stopifnot(require(fastmatch))
  fmatch(x, table, nomatch = 0L) > 0L
}
system.time(for(i in 1:100) a <- x %in% table)
#    user  system elapsed 
#   1.780   0.000   1.782 
system.time(for(i in 1:100) b <- x %fin% table)
#    user  system elapsed 
#   0.052   0.000   0.054
identical(a, b)
# [1] TRUE

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复