Add a row by reference at the end of a data.table object

后端 未结 1 1925
滥情空心
滥情空心 2020-11-27 17:10

In this question the data.table package creator explains why rows cannot be inserted (or removed) by reference in the middle a data.table yet. He a

相关标签:
1条回答
  • 2020-11-27 17:38

    To answer your edit, just run a benchmark:

    a = data.table(id=letters[1:2], var=1:2)
    b = copy(a)
    c = copy(b) # let's also just try modifying same value in place
                # to see how well changing existing values does
    microbenchmark(a <- rbind(a, data.table(id="c", var=3)),
                   b <- rbindlist(list(b,  data.table(id="c", var=3))),
                   c[1, var := 3L],
                   set(c, 1L, 2L, 3L))
    #Unit: microseconds
    #                                                  expr     min        lq    median        uq      max neval
    #          a <- rbind(a, data.table(id = "c", var = 3)) 865.460 1141.2585 1357.1230 1539.4300 6814.492   100
    #b <- rbindlist(list(b, data.table(id = "c", var = 3))) 260.440  325.3835  445.4190  522.8825 1143.930   100
    #                                   c[1, `:=`(var, 3L)] 482.147  626.5570  778.3135  904.3595 1109.539   100
    #                                    set(c, 1L, 2L, 3L)   2.339    5.677    7.5140    9.5170   19.033   100
    

    rbindlist is clearly better than rbind. Thanks to Matthew Dowle pointing out the problems with using [ in a loop, I added another benchmark with set.

    From the above your best options are using rbindlist, or sizing the data.table to begin with and then just populating the values (you can also use a similar strategy to std::vector in C++, and double the size every time you run out of space, if you don't know the size of the data to begin with, and then once you're done filling it in, delete the extra rows).

    0 讨论(0)
提交回复
热议问题