I am using R
to cbind about ~11000 files using:
dat <- do.call(\'bind_cols\',lapply(lfiles,read.delim))
which is unbelievably s
If you have all numerical data, you can convert to matrix first, which can be quite a bit faster than data frames:
> microbenchmark(
do.call(cbind, rep(list(sleep), 1000)),
do.call(cbind, rep(list(as.matrix(sleep)), 1000))
)
Unit: microseconds
expr min lq mean
do.call(cbind, rep(list(sleep), 1000)) 6978.635 7496.690 8038.21531
do.call(cbind, rep(list(as.matrix(sleep)), 1000)) 636.282 722.814 862.01125
median uq max neval
7864.180 8397.8595 12213.473 100
744.647 793.0695 7416.430 100
Alternatively, if you want a data frame, you can cheat by using unlist
and then setting the class manually:
df <- unlist(rep(list(sleep), 1000), recursive=FALSE)
class(df) <- 'data.frame'
For fast reading of files, we can use fread
from data.table
and then rbind
the list
of data.table
using rbindlist
specifying the idcol=TRUE
to provide a grouping variable to identify each of the datasets
library(data.table)
DT <- rbindlist(lapply(lfiles, fread), idcol=TRUE)