How can I efficiently save and load a big list

后端 未结 1 540
我在风中等你
我在风中等你 2021-01-29 12:59

Disclaimer: Many of you pointed to a duplicated post, I was aware of it but I believe it\'s not a fair duplicate as some way of saving/loading might be diff

1条回答
  •  深忆病人
    2021-01-29 13:15

    After some research it appears that there is no real alternative to the base saveRDS function and not many packages dealing with large lists.

    Saving a list as a column of a data.table/data.frame doesn't works with the packages fst and feather, it works with the package data.table. However when reading it back it becomes a character compelling the use of strsplit or its fastest alternative str_split.

    The only package directly focused on lists that i could find was rlist, however it does not speed up list reading or writing from/to a file when compared to the base functions saveRDS, readRDS.

    Benchmarks:

    l <- lapply(1:10000000, function (x) {rnorm(sample(1:5, size = 1, replace = T))} )
    dt_l <- data.table(l = as.list(l))
    
    microbenchmark::microbenchmark(times = 5L,
      "data.table"     =  { fwrite(dt_l, "dt_l.csv")
                            dt_l   <- fread("dt_l.csv", sep = ",", sep2 = "\\|")
                            l_load <- str_split(dt_l$l, "\\|")
                          },
    
      "rlist"          =  { list.save(l, "l.rds")
                            l_load <- list.load("l.rds")
                          },
    
      "RDS_base"       =  { saveRDS(l, "l.rds")
                            l_load <- readRDS("l.rds")                        
                          }
    
    )
    
    Unit: seconds
              expr      min       lq     mean   median       uq      max neval
        data.table 18.30548 18.67964 18.98801 19.17744 19.19791 19.57956     5
     RDS_list.save 16.80936 16.81615 16.86114 16.84012 16.91770 16.92236     5
          RDS_base 16.90403 17.23784 18.62475 19.48391 19.60365 19.89431     5
    

    0 讨论(0)
提交回复
热议问题