How can I efficiently save and load a big list

后端未结

关注

 1  543

我在风中等你 2021-01-29 12:59

Disclaimer: Many of you pointed to a duplicated post, I was aware of it but I believe it\'s not a fair duplicate as some way of saving/loading might be diff

1条回答

深忆病人 (楼主)

2021-01-29 13:15

After some research it appears that there is no real alternative to the base saveRDS function and not many packages dealing with large lists.

Saving a list as a column of a data.table/data.frame doesn't works with the packages fst and feather, it works with the package data.table. However when reading it back it becomes a character compelling the use of strsplit or its fastest alternative str_split.

The only package directly focused on lists that i could find was rlist, however it does not speed up list reading or writing from/to a file when compared to the base functions saveRDS, readRDS.

Benchmarks:

l <- lapply(1:10000000, function (x) {rnorm(sample(1:5, size = 1, replace = T))} ) dt_l <- data.table(l = as.list(l)) microbenchmark::microbenchmark(times = 5L, "data.table" = { fwrite(dt_l, "dt_l.csv") dt_l <- fread("dt_l.csv", sep = ",", sep2 = "\\|") l_load <- str_split(dt_l$l, "\\|") }, "rlist" = { list.save(l, "l.rds") l_load <- list.load("l.rds") }, "RDS_base" = { saveRDS(l, "l.rds") l_load <- readRDS("l.rds") } ) Unit: seconds expr min lq mean median uq max neval data.table 18.30548 18.67964 18.98801 19.17744 19.19791 19.57956 5 RDS_list.save 16.80936 16.81615 16.86114 16.84012 16.91770 16.92236 5 RDS_base 16.90403 17.23784 18.62475 19.48391 19.60365 19.89431 5

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复