What happens if I want to select all the rows in a data.table that do not contain a particular value in the key variable using binary search? By the way, what is the correct
Andrie's answer is great, and is what I'd probably use. Interestingly, though, the following construct seems to be (just a bit) faster, especially as the size of the data.tables increase.
DT[J(x = unique(DT)[x!="a"][,x])]
##-------------------------------- Timings -----------------------------------##
library(data.table)
library(rbenchmark)
DT = data.table(x=rep(c("a","b","c"),each=45e5), y=c(1,3,6), v=1:9, key="x")
Josh <- function() DT[J(x = unique(DT)[x!="a"][,x])]
Andrie <- function() DT[-DT["a", which=TRUE]]
## Compare results
identical(Josh(), setkey(Andrie(), "x"))
# [1] TRUE
## Compare timings
benchmark(replications = 10, order="relative", Josh=Josh(), Andrie=Andrie())
test replications elapsed relative user.self sys.self user.child sys.child
1 Josh 10 17.50 1.000 14.78 3.6 NA NA
2 Andrie 10 18.75 1.071 16.52 3.2 NA NA
I'd be especially tempted to use this if DT[,x]
could be made to return a data.table rather than a vector. Then, the construct could be simplified a bit to DT[unique(DT[,x])[x!="a"]]
. Also, it would then work even when there are mulitiple columns in the key, which it currently does not.