What is the fastest way to update a data set in R?

后端未结

关注

 2  410

I have a 20000 * 5 data set. Currently it is being processed in an iterative manner and the data set gets updated continuously on every iteration.

The cells in the

相关标签:

2条回答

我寻月下人不归

2021-01-07 03:32
You can start just with manual of ?set function. In example you will find code that you can use to benchmark. I just re-run it and got the following timings.
```
library(data.table)
m = matrix(1, nrow = 2e6L, ncol = 100L)
DF = as.data.frame(m)
DT = as.data.table(m)    

system.time(for (i in 1:1000) DF[i, 1] = i)
#   user  system elapsed 
#  3.048   1.512  24.854
system.time(for (i in 1:1000) DT[i, V1 := i])
#   user  system elapsed 
#  0.232   0.000   0.259 
system.time(for (i in 1:1000) set(DT, i, 1L, i))
#   user  system elapsed 
#  0.000   0.000   0.002
```
Ideally you need to check your data update scenario on your data and scale to properly measure which is the "fastest". Also be sure to check memory usage, using [<- on matrix seems to use more memory than data.table way, if you end up swapping it will be way slower.
0 讨论(0)
发布评论:

提交评论
- 加载中...

执念已碎

2021-01-07 03:40

Interestingly enough, if you're using a data.table it doesn't seem to be faster at first glance. Perhaps it's getting faster when using the assignment inside of a loop.

library(data.table)
library(microbenchmark)
dt <- data.table(test)

# Accessing the entry
dt[765, "C", with = FALSE] 

# Replacing the value with the new one
# Basic data.table syntax
dt[i =765, C := C + 25 ]

# Replacing the value with the new one
# using set() from data.table
set(dt, i = 765L, j = "C", value = dt[765L,C] + 25)

microbenchmark(
      a = set(dt, i = 765L, j = "C", value = dt[765L,C] + 25)
    , b = dt[i =765, C := C + 25 ]
    , c = test[765, "C"] <- test[765, "C"] + 25
    , times = 1000       
  )

The results from microbenchmark:

                                                   expr     min      lq     mean  median       uq      max neval
 a = set(dt, i = 765L, j = "C", value = dt[765L, C] + 25) 236.357 46.621 266.4188 250.847 260.2050  572.630  1000
 b = dt[i = 765, `:=`(C, C + 25)]                         333.556 345.329 375.8690 351.668 362.6860 1603.482  1000
 c = test[765, "C"] <- test[765, "C"] + 25                73.051  81.805 129.1665  84.220  87.6915 1749.281  1000

0 讨论(0)