I have a 20000 * 5 data set. Currently it is being processed in an iterative manner and the data set gets updated continuously on every iteration.
The cells in the
You can start just with manual of ?set
function. In example you will find code that you can use to benchmark. I just re-run it and got the following timings.
library(data.table)
m = matrix(1, nrow = 2e6L, ncol = 100L)
DF = as.data.frame(m)
DT = as.data.table(m)
system.time(for (i in 1:1000) DF[i, 1] = i)
# user system elapsed
# 3.048 1.512 24.854
system.time(for (i in 1:1000) DT[i, V1 := i])
# user system elapsed
# 0.232 0.000 0.259
system.time(for (i in 1:1000) set(DT, i, 1L, i))
# user system elapsed
# 0.000 0.000 0.002
Ideally you need to check your data update scenario on your data and scale to properly measure which is the "fastest". Also be sure to check memory usage, using [<-
on matrix seems to use more memory than data.table way, if you end up swapping it will be way slower.
Interestingly enough, if you're using a data.table it doesn't seem to be faster at first glance. Perhaps it's getting faster when using the assignment inside of a loop.
library(data.table)
library(microbenchmark)
dt <- data.table(test)
# Accessing the entry
dt[765, "C", with = FALSE]
# Replacing the value with the new one
# Basic data.table syntax
dt[i =765, C := C + 25 ]
# Replacing the value with the new one
# using set() from data.table
set(dt, i = 765L, j = "C", value = dt[765L,C] + 25)
microbenchmark(
a = set(dt, i = 765L, j = "C", value = dt[765L,C] + 25)
, b = dt[i =765, C := C + 25 ]
, c = test[765, "C"] <- test[765, "C"] + 25
, times = 1000
)
The results from microbenchmark:
expr min lq mean median uq max neval
a = set(dt, i = 765L, j = "C", value = dt[765L, C] + 25) 236.357 46.621 266.4188 250.847 260.2050 572.630 1000
b = dt[i = 765, `:=`(C, C + 25)] 333.556 345.329 375.8690 351.668 362.6860 1603.482 1000
c = test[765, "C"] <- test[765, "C"] + 25 73.051 81.805 129.1665 84.220 87.6915 1749.281 1000