What is the fastest way to update a data set in R?

后端 未结 2 406
谎友^
谎友^ 2021-01-07 03:08

I have a 20000 * 5 data set. Currently it is being processed in an iterative manner and the data set gets updated continuously on every iteration.

The cells in the

相关标签:
2条回答
  • 2021-01-07 03:32

    You can start just with manual of ?set function. In example you will find code that you can use to benchmark. I just re-run it and got the following timings.

    library(data.table)
    m = matrix(1, nrow = 2e6L, ncol = 100L)
    DF = as.data.frame(m)
    DT = as.data.table(m)    
    
    system.time(for (i in 1:1000) DF[i, 1] = i)
    #   user  system elapsed 
    #  3.048   1.512  24.854
    system.time(for (i in 1:1000) DT[i, V1 := i])
    #   user  system elapsed 
    #  0.232   0.000   0.259 
    system.time(for (i in 1:1000) set(DT, i, 1L, i))
    #   user  system elapsed 
    #  0.000   0.000   0.002
    

    Ideally you need to check your data update scenario on your data and scale to properly measure which is the "fastest". Also be sure to check memory usage, using [<- on matrix seems to use more memory than data.table way, if you end up swapping it will be way slower.

    0 讨论(0)
  • 2021-01-07 03:40

    Interestingly enough, if you're using a data.table it doesn't seem to be faster at first glance. Perhaps it's getting faster when using the assignment inside of a loop.

    library(data.table)
    library(microbenchmark)
    dt <- data.table(test)
    
    # Accessing the entry
    dt[765, "C", with = FALSE] 
    
    # Replacing the value with the new one
    # Basic data.table syntax
    dt[i =765, C := C + 25 ]
    
    # Replacing the value with the new one
    # using set() from data.table
    set(dt, i = 765L, j = "C", value = dt[765L,C] + 25)
    
    microbenchmark(
          a = set(dt, i = 765L, j = "C", value = dt[765L,C] + 25)
        , b = dt[i =765, C := C + 25 ]
        , c = test[765, "C"] <- test[765, "C"] + 25
        , times = 1000       
      )
    

    The results from microbenchmark:

                                                       expr     min      lq     mean  median       uq      max neval
     a = set(dt, i = 765L, j = "C", value = dt[765L, C] + 25) 236.357 46.621 266.4188 250.847 260.2050  572.630  1000
     b = dt[i = 765, `:=`(C, C + 25)]                         333.556 345.329 375.8690 351.668 362.6860 1603.482  1000
     c = test[765, "C"] <- test[765, "C"] + 25                73.051  81.805 129.1665  84.220  87.6915 1749.281  1000
    
    0 讨论(0)
提交回复
热议问题