How to convert a list of lists to a dataframe - non-identical lists

前端 未结 3 388
闹比i
闹比i 2020-12-17 05:59

I have a list where each element is a named list, but the elements are not the same everywhere. I have read solutions on how to convert lists of lists to dataframes here and

相关标签:
3条回答
  • 2020-12-17 06:21

    Any function using data.frame(.) on each element of the list before binding would be terribly inefficient (not to mention unnecessary). Here's another way using data.table's rbindlist (from v1.9.3) which you can get here.

    require(data.table) ## 1.9.3
    rbindlist(lisnotOK, fill=TRUE)
    #     a b     c    d
    # 1:  1 2    hi   NA
    # 2: NA 2 hello nope
    

    It works on list-of-lists (as in this question), data.frames and data.tables.

    If not this, then I'd go with Ananda's list2mat function (if your types are all identical).


    Benchmarks on Ananda's L2 data:

    fun1 <- function(inList) ldply(inList, as.data.frame)
    fun2 <- function(inList) list2mat(inList)
    fun3 <- function(inList) rbindlist(inList, fill=TRUE)
    fun4 <- function(inList) rbind_all(lapply(inList, as.data.frame))
    
    microbenchmark(fun1(L2), fun2(L2), fun3(L2), fun4(L2), times = 10)
    # Unit: milliseconds
    #      expr         min          lq      median          uq         max neval
    #  fun1(L2) 1927.857847 2161.432665 2221.999940 2276.241366 2366.649614    10
    #  fun2(L2)   12.039652   12.167613   12.361629   12.483751   16.040885    10
    #  fun3(L2)    1.225929    1.374395    1.473621    1.510876    1.858597    10
    #  fun4(L2) 1435.153576 1457.053482 1492.334965 1548.547706 1630.443430    10
    

    Note: I've used as.data.frame(.) instead of data.frame(.) (former is slightly faster).

    0 讨论(0)
  • 2020-12-17 06:44

    Considering that you are OK with the resulting matrix being all of the same type (say, character), you can try to write your own function, like this:

    list2mat <- function(inList) {
      UL <- unlist(inList)
      Nam <- unique(names(UL))
      M <- matrix(NA_character_, 
                  nrow = length(inList), ncol = length(Nam), 
                  dimnames = list(NULL, Nam))
      Row <- rep(seq_along(inList), sapply(inList, length))
      Col <- match(names(UL), Nam)
      M[cbind(Row, Col)] <- UL
      M
    }
    

    Usage would be:

    list2mat(lisnotOK)
    #      a   b   c       d     
    # [1,] "1" "2" "hi"    NA    
    # [2,] NA  "2" "hello" "nope"
    

    This should be pretty fast since everything is pre-allocated and you are making use of matrix indexing.


    Update: Benchmarks (since you said efficiency was a concern)

    fun1 <- function(inList) ldply(inList, data.frame)
    fun2 <- function(inList) list2mat(inList)
    
    library(microbenchmark)
    microbenchmark(fun1(lisnotOK), fun2(lisnotOK))
    # Unit: microseconds
    #            expr      min        lq    median       uq      max neval
    #  fun1(lisnotOK) 4193.808 4340.0585 4523.3000 4912.233 7600.341   100
    #  fun2(lisnotOK)  163.784  182.3865  211.2515  236.910  363.489   100
    
    L2 <- unlist(replicate(1000, lisnotOK, simplify=FALSE), recursive=FALSE)
    microbenchmark(fun1(L2), fun2(L2), times = 10)
    # Unit: milliseconds
    #      expr        min         lq     median         uq        max neval
    #  fun1(L2) 3032.71572 3106.79006 3196.17178 3306.11756 3609.67445    10
    #  fun2(L2)   24.16817   24.86991   25.65569   27.44128   29.41908    10
    
    0 讨论(0)
  • 2020-12-17 06:47

    Use lapply to convert your list elements to data.frames and rbind_all that:

    rbind_all(lapply(lisnotOK,data.frame))
       a b     c    d
    1  1 2    hi <NA>
    2 NA 2 hello nope
    Warning message:
    In rbind_all(lapply(lisnotOK, data.frame)) :
      Unequal factor levels: coercing to character
    

    Or from plyr, ldply with data.frame:

    ldply(lisnotOK,data.frame)
       a b     c    d
    1  1 2    hi <NA>
    2 NA 2 hello nope
    
    0 讨论(0)
提交回复
热议问题