How do I do a negative / nomatch / inverse search in data.table?

前端 未结 2 695
囚心锁ツ
囚心锁ツ 2020-12-29 06:00

What happens if I want to select all the rows in a data.table that do not contain a particular value in the key variable using binary search? By the way, what is the correct

2条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-29 06:29

    Andrie's answer is great, and is what I'd probably use. Interestingly, though, the following construct seems to be (just a bit) faster, especially as the size of the data.tables increase.

    DT[J(x = unique(DT)[x!="a"][,x])]
    
    ##-------------------------------- Timings -----------------------------------##
    
    library(data.table)
    library(rbenchmark)
    
    DT = data.table(x=rep(c("a","b","c"),each=45e5), y=c(1,3,6), v=1:9, key="x")
    Josh <- function() DT[J(x = unique(DT)[x!="a"][,x])]
    Andrie <- function() DT[-DT["a", which=TRUE]]
    
    ## Compare results
    identical(Josh(), setkey(Andrie(), "x"))  
    # [1] TRUE
    
    ## Compare timings
    benchmark(replications = 10, order="relative", Josh=Josh(), Andrie=Andrie())
        test replications elapsed relative user.self sys.self user.child sys.child
    1   Josh           10   17.50    1.000     14.78      3.6         NA        NA
    2 Andrie           10   18.75    1.071     16.52      3.2         NA        NA
    

    I'd be especially tempted to use this if DT[,x] could be made to return a data.table rather than a vector. Then, the construct could be simplified a bit to DT[unique(DT[,x])[x!="a"]]. Also, it would then work even when there are mulitiple columns in the key, which it currently does not.

提交回复
热议问题