How can I efficiently save and load a big list

问题

Disclaimer: Many of you pointed to a duplicated post, I was aware of it but I believe it's not a fair duplicate as some way of saving/loading might be different for data frames and lists. For instance the packages fst and feather work on data frames but not on lists.

My question is specific to lists.

I have a ~50M element list and I'd like to save it to a file to share it among different R sessions.

I know the native ways of saving in R (save, save.image, saveRDS). My point was : would you still use these functions on big scale data?

What is the fastest way to save it and read it back? (any R readable format would be alright).

回答1:

After some research it appears that there is no real alternative to the base saveRDS function and not many packages dealing with large lists.

Saving a list as a column of a data.table/data.frame doesn't works with the packages fst and feather, it works with the package data.table. However when reading it back it becomes a character compelling the use of strsplit or its fastest alternative str_split.

The only package directly focused on lists that i could find was rlist, however it does not speed up list reading or writing from/to a file when compared to the base functions saveRDS, readRDS.

Benchmarks:

l <- lapply(1:10000000, function (x) {rnorm(sample(1:5, size = 1, replace = T))} )
dt_l <- data.table(l = as.list(l))

microbenchmark::microbenchmark(times = 5L,
  "data.table"     =  { fwrite(dt_l, "dt_l.csv")
                        dt_l   <- fread("dt_l.csv", sep = ",", sep2 = "\\|")
                        l_load <- str_split(dt_l$l, "\\|")
                      },

  "rlist"          =  { list.save(l, "l.rds")
                        l_load <- list.load("l.rds")
                      },

  "RDS_base"       =  { saveRDS(l, "l.rds")
                        l_load <- readRDS("l.rds")                        
                      }

)

Unit: seconds
          expr      min       lq     mean   median       uq      max neval
    data.table 18.30548 18.67964 18.98801 19.17744 19.19791 19.57956     5
 RDS_list.save 16.80936 16.81615 16.86114 16.84012 16.91770 16.92236     5
      RDS_base 16.90403 17.23784 18.62475 19.48391 19.60365 19.89431     5

来源：https://stackoverflow.com/questions/51619320/how-can-i-efficiently-save-and-load-a-big-list

标签

list

save

bigdata