How do I do a negative / nomatch / inverse search in data.table?

前端未结

关注

 2  695

囚心锁ツ 2020-12-29 06:00

What happens if I want to select all the rows in a data.table that do not contain a particular value in the key variable using binary search? By the way, what is the correct

2条回答

小蘑菇 (楼主)

2020-12-29 06:29

Andrie's answer is great, and is what I'd probably use. Interestingly, though, the following construct seems to be (just a bit) faster, especially as the size of the data.tables increase.

DT[J(x = unique(DT)[x!="a"][,x])]

##-------------------------------- Timings -----------------------------------##

library(data.table)
library(rbenchmark)

DT = data.table(x=rep(c("a","b","c"),each=45e5), y=c(1,3,6), v=1:9, key="x")
Josh <- function() DT[J(x = unique(DT)[x!="a"][,x])]
Andrie <- function() DT[-DT["a", which=TRUE]]

## Compare results
identical(Josh(), setkey(Andrie(), "x"))  
# [1] TRUE

## Compare timings
benchmark(replications = 10, order="relative", Josh=Josh(), Andrie=Andrie())
    test replications elapsed relative user.self sys.self user.child sys.child
1   Josh           10   17.50    1.000     14.78      3.6         NA        NA
2 Andrie           10   18.75    1.071     16.52      3.2         NA        NA

I'd be especially tempted to use this if DT[,x] could be made to return a data.table rather than a vector. Then, the construct could be simplified a bit to DT[unique(DT[,x])[x!="a"]]. Also, it would then work even when there are mulitiple columns in the key, which it currently does not.

0 讨论(0)

查看其它2个回答