Faster %in% operator

前端 未结 2 622
太阳男子
太阳男子 2021-02-07 00:55

The fastmatch package implements a much faster version of match for repeated matches (e.g. in a loop):

set.seed(1)
library(fastmatch)
table <- 1L         


        
相关标签:
2条回答
  • 2021-02-07 01:35

    match is almost always better done by putting both vectors in dataframes and merging (see various joins from dplyr)

    For example, something like this would give you all the info you need:

    library(dplyr)
    
    data = data_frame(data.ID = 1L:100000L,
                      data.extra = 1:2)
    
    sample = 
      data %>% 
      sample_n(10000, replace=TRUE) %>%
      mutate(sample.ID = 1:n(),
             sample.extra = 3:4 )
    
    # join table not strictly necessary in this case
    # but necessary in many-to-many matches
    data__sample = inner_join(data, sample)
    
    #check whether a data.ID made it into sample
    data__sample %>% filter(data.ID == 1)
    

    or left_join, right_join, full_join, semi_join, anti_join, depending on what info is most useful to you

    0 讨论(0)
  • 2021-02-07 01:48

    Look at the definition of %in%:

    R> `%in%`
    function (x, table) 
    match(x, table, nomatch = 0L) > 0L
    <bytecode: 0x1fab7a8>
    <environment: namespace:base>
    

    It's easy to write your own %fin% function:

    `%fin%` <- function(x, table) {
      stopifnot(require(fastmatch))
      fmatch(x, table, nomatch = 0L) > 0L
    }
    system.time(for(i in 1:100) a <- x %in% table)
    #    user  system elapsed 
    #   1.780   0.000   1.782 
    system.time(for(i in 1:100) b <- x %fin% table)
    #    user  system elapsed 
    #   0.052   0.000   0.054
    identical(a, b)
    # [1] TRUE
    
    0 讨论(0)
提交回复
热议问题