Fast concatenation of thousands of files by columns

折月煮酒 提交于 2019-12-02 02:16:16

For fast reading of files, we can use fread from data.table and then rbind the list of data.table using rbindlist specifying the idcol=TRUE to provide a grouping variable to identify each of the datasets

library(data.table)
DT <- rbindlist(lapply(lfiles, fread), idcol=TRUE)

If you have all numerical data, you can convert to matrix first, which can be quite a bit faster than data frames:

> microbenchmark(
do.call(cbind, rep(list(sleep), 1000)),
do.call(cbind, rep(list(as.matrix(sleep)), 1000))
)
Unit: microseconds
                                              expr      min       lq       mean
            do.call(cbind, rep(list(sleep), 1000)) 6978.635 7496.690 8038.21531
 do.call(cbind, rep(list(as.matrix(sleep)), 1000))  636.282  722.814  862.01125
   median        uq       max neval
 7864.180 8397.8595 12213.473   100
  744.647  793.0695  7416.430   100

Alternatively, if you want a data frame, you can cheat by using unlist and then setting the class manually:

df <- unlist(rep(list(sleep), 1000), recursive=FALSE)
class(df) <- 'data.frame'
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!