Keyed lookup on data.table without 'with'

后端 未结 4 2052
执笔经年
执笔经年 2020-12-02 00:35

I have a data.table structure like so (except mine is really huge):

dt <- data.table(x=1:5, y=3:7, key=\'x\')

I want to loo

相关标签:
4条回答
  • 2020-12-02 01:07

    Setting a key is not required and it's faster:

    dt[eval(dt[, x %in% ..x])]
    
       x y
    1: 3 5
    2: 4 6
    

    Benchmark with the previously posted answers

    microbenchmark(dt[eval(dt[, x %in% ..x])],
                   dt[J(get('x', parent.frame(3)))],
                   dt[eval(list(x))],
                   dt[eval(J(x))],
                   dt[eval(.(x))],
                   merge(dt, data.table(x)),
                   times = 100L)
    
    Unit: microseconds
                                      expr    min      lq     mean  median      uq    max neval
          dt[eval(dt[, x %in% ..x])]  486.1  500.60  518.529  503.70  512.65 1238.0   100
    dt[J(get("x", parent.frame(3)))]  837.3  853.25  891.424  860.00  868.30 1675.3   100
                   dt[eval(list(x))]  831.8  842.70  929.521  851.95  859.85 3878.3   100
                      dt[eval(J(x))]  833.8  845.50  948.535  856.00  870.00 4599.2   100
                      dt[eval(.(x))]  828.6  846.40  871.054  851.75  859.35 1985.6   100
            merge(dt, data.table(x)) 1766.0 1804.70 1907.617 1819.95 1870.95 3123.1   100
    
    0 讨论(0)
  • 2020-12-02 01:16

    Adding some benchmarking results, by request.

    dt is a 53080731 x 5 data.table object, keyed by a numeric column with around 100 unique values, fairly evenly distributed. x is a vector containing 5 of those values.

    library(microbenchmark)
    > mb <- microbenchmark(
    +     dt[eval(J(x))],
    +     merge(dt, data.table(x)),
    +     times=10
    + )
    > mb
    Unit: milliseconds
                         expr      min       lq    median       uq      max neval
               dt[eval(J(x))]  127.324  127.549  133.5305  154.410  159.433    10
     merge(dt, data.table(x)) 5028.349 5083.792 5129.6590 5170.451 5250.255    10
    

    @Tyler, if you can assist me with how to use qdap::lookup() for this case with multiple columns, I can add that too.

    0 讨论(0)
  • 2020-12-02 01:27

    New answer, now that I think I understand what was requested:

    > X <- data.table(x=x)
    > merge(dt, X)
       x y
    1: 3 6
    2: 4 7
    
    0 讨论(0)
  • 2020-12-02 01:33

    There is an item in the NEWS for 1.8.2 that suggests a ..() syntax will be added at some point, allowing this

    New DT[.(...)] syntax (in the style of package plyr) is identical to DT[list(...)], DT[J(...)] and DT[data.table(...)]. We plan to add ..(), too, so that .() and ..() are analogous to the file system's ./ and ../; i.e., .() evaluates within the frame of DT and ..() in the parent scope.

    In the mean time, you can get from the appropriate environment

    dt[J(get('x', envir = parent.frame(3)))]
    ##    x y
    ## 1: 3 5
    ## 2: 4 6
    

    or you could eval the whole call to list(x) or J(x)

    dt[eval(list(x))]
    dt[eval(J(x))]
    dt[eval(.(x))]
    
    0 讨论(0)
提交回复
热议问题