Fast concatenation of thousands of files by columns

前端 未结 2 1860
遥遥无期
遥遥无期 2021-01-23 03:56

I am using R to cbind about ~11000 files using:

dat <- do.call(\'bind_cols\',lapply(lfiles,read.delim))

which is unbelievably s

相关标签:
2条回答
  • 2021-01-23 04:31

    If you have all numerical data, you can convert to matrix first, which can be quite a bit faster than data frames:

    > microbenchmark(
    do.call(cbind, rep(list(sleep), 1000)),
    do.call(cbind, rep(list(as.matrix(sleep)), 1000))
    )
    Unit: microseconds
                                                  expr      min       lq       mean
                do.call(cbind, rep(list(sleep), 1000)) 6978.635 7496.690 8038.21531
     do.call(cbind, rep(list(as.matrix(sleep)), 1000))  636.282  722.814  862.01125
       median        uq       max neval
     7864.180 8397.8595 12213.473   100
      744.647  793.0695  7416.430   100
    

    Alternatively, if you want a data frame, you can cheat by using unlist and then setting the class manually:

    df <- unlist(rep(list(sleep), 1000), recursive=FALSE)
    class(df) <- 'data.frame'
    
    0 讨论(0)
  • 2021-01-23 04:48

    For fast reading of files, we can use fread from data.table and then rbind the list of data.table using rbindlist specifying the idcol=TRUE to provide a grouping variable to identify each of the datasets

    library(data.table)
    DT <- rbindlist(lapply(lfiles, fread), idcol=TRUE)
    
    0 讨论(0)
提交回复
热议问题