How to convert a list of lists to a dataframe - non-identical lists

前端未结

关注

 3  389

I have a list where each element is a named list, but the elements are not the same everywhere. I have read solutions on how to convert lists of lists to dataframes here and

相关标签:

3条回答

时光说笑

2020-12-17 06:21

Any function using data.frame(.) on each element of the list before binding would be terribly inefficient (not to mention unnecessary). Here's another way using data.table's rbindlist (from v1.9.3) which you can get here.

require(data.table) ## 1.9.3
rbindlist(lisnotOK, fill=TRUE)
#     a b     c    d
# 1:  1 2    hi   NA
# 2: NA 2 hello nope

It works on list-of-lists (as in this question), data.frames and data.tables.

If not this, then I'd go with Ananda's list2mat function (if your types are all identical).

Benchmarks on Ananda's L2 data:

fun1 <- function(inList) ldply(inList, as.data.frame)
fun2 <- function(inList) list2mat(inList)
fun3 <- function(inList) rbindlist(inList, fill=TRUE)
fun4 <- function(inList) rbind_all(lapply(inList, as.data.frame))

microbenchmark(fun1(L2), fun2(L2), fun3(L2), fun4(L2), times = 10)
# Unit: milliseconds
#      expr         min          lq      median          uq         max neval
#  fun1(L2) 1927.857847 2161.432665 2221.999940 2276.241366 2366.649614    10
#  fun2(L2)   12.039652   12.167613   12.361629   12.483751   16.040885    10
#  fun3(L2)    1.225929    1.374395    1.473621    1.510876    1.858597    10
#  fun4(L2) 1435.153576 1457.053482 1492.334965 1548.547706 1630.443430    10

Note: I've used as.data.frame(.) instead of data.frame(.) (former is slightly faster).

0 讨论(0)

悲哀的现实

2020-12-17 06:44

Considering that you are OK with the resulting matrix being all of the same type (say, character), you can try to write your own function, like this:

list2mat <- function(inList) {
  UL <- unlist(inList)
  Nam <- unique(names(UL))
  M <- matrix(NA_character_, 
              nrow = length(inList), ncol = length(Nam), 
              dimnames = list(NULL, Nam))
  Row <- rep(seq_along(inList), sapply(inList, length))
  Col <- match(names(UL), Nam)
  M[cbind(Row, Col)] <- UL
  M
}

Usage would be:

list2mat(lisnotOK)
#      a   b   c       d     
# [1,] "1" "2" "hi"    NA    
# [2,] NA  "2" "hello" "nope"

This should be pretty fast since everything is pre-allocated and you are making use of matrix indexing.

Update: Benchmarks (since you said efficiency was a concern)

fun1 <- function(inList) ldply(inList, data.frame)
fun2 <- function(inList) list2mat(inList)

library(microbenchmark)
microbenchmark(fun1(lisnotOK), fun2(lisnotOK))
# Unit: microseconds
#            expr      min        lq    median       uq      max neval
#  fun1(lisnotOK) 4193.808 4340.0585 4523.3000 4912.233 7600.341   100
#  fun2(lisnotOK)  163.784  182.3865  211.2515  236.910  363.489   100

L2 <- unlist(replicate(1000, lisnotOK, simplify=FALSE), recursive=FALSE)
microbenchmark(fun1(L2), fun2(L2), times = 10)
# Unit: milliseconds
#      expr        min         lq     median         uq        max neval
#  fun1(L2) 3032.71572 3106.79006 3196.17178 3306.11756 3609.67445    10
#  fun2(L2)   24.16817   24.86991   25.65569   27.44128   29.41908    10

0 讨论(0)

醉酒成梦

2020-12-17 06:47

Use lapply to convert your list elements to data.frames and rbind_all that:

rbind_all(lapply(lisnotOK,data.frame))
   a b     c    d
1  1 2    hi <NA>
2 NA 2 hello nope
Warning message:
In rbind_all(lapply(lisnotOK, data.frame)) :
  Unequal factor levels: coercing to character

Or from plyr, ldply with data.frame:

ldply(lisnotOK,data.frame)
   a b     c    d
1  1 2    hi <NA>
2 NA 2 hello nope

0 讨论(0)