Combine a list of data frames into one data frame

前端 未结 9 2061
半阙折子戏
半阙折子戏 2020-11-21 11:25

I have code that at one place ends up with a list of data frames which I really want to convert to a single big data frame.

I got some pointers from an earlier ques

相关标签:
9条回答
  • 2020-11-21 12:10

    An updated visual for those wanting to compare some of the recent answers (I wanted to compare the purrr to dplyr solution). Basically I combined answers from @TheVTM and @rmf.

    Code:

    library(microbenchmark)
    library(data.table)
    library(tidyverse)
    
    dflist <- vector(length=10,mode="list")
    for(i in 1:100)
    {
      dflist[[i]] <- data.frame(a=runif(n=260),b=runif(n=260),
                                c=rep(LETTERS,10),d=rep(LETTERS,10))
    }
    
    
    mb <- microbenchmark(
      dplyr::bind_rows(dflist),
      data.table::rbindlist(dflist),
      purrr::map_df(dflist, bind_rows),
      do.call("rbind",dflist),
      times=500)
    
    ggplot2::autoplot(mb)
    

    Session Info:

    sessionInfo()
    R version 3.4.1 (2017-06-30)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows 7 x64 (build 7601) Service Pack 1
    

    Package Versions:

    > packageVersion("tidyverse")
    [1] ‘1.1.1’
    > packageVersion("data.table")
    [1] ‘1.10.0’
    
    0 讨论(0)
  • 2020-11-21 12:11

    For the purpose of completeness, I thought the answers to this question required an update. "My guess is that using do.call("rbind", ...) is going to be the fastest approach that you will find..." It was probably true for May 2010 and some time after, but in about Sep 2011 a new function rbindlist was introduced in the data.table package version 1.8.2, with a remark that "This does the same as do.call("rbind",l), but much faster". How much faster?

    library(rbenchmark)
    benchmark(
      do.call = do.call("rbind", listOfDataFrames),
      plyr_rbind.fill = plyr::rbind.fill(listOfDataFrames), 
      plyr_ldply = plyr::ldply(listOfDataFrames, data.frame),
      data.table_rbindlist = as.data.frame(data.table::rbindlist(listOfDataFrames)),
      replications = 100, order = "relative", 
      columns=c('test','replications', 'elapsed','relative')
      ) 
    

                      test replications elapsed relative
    4 data.table_rbindlist          100    0.11    1.000
    1              do.call          100    9.39   85.364
    2      plyr_rbind.fill          100   12.08  109.818
    3           plyr_ldply          100   15.14  137.636
    
    0 讨论(0)
  • 2020-11-21 12:12

    How it should be done in the tidyverse:

    df.dplyr.purrr <- listOfDataFrames %>% map_df(bind_rows)
    
    0 讨论(0)
提交回复
热议问题