I have a list where each element is a named list, but the elements are not the same everywhere. I have read solutions on how to convert lists of lists to dataframes here and
Any function using data.frame(.)
on each element of the list before binding would be terribly inefficient (not to mention unnecessary). Here's another way using data.table
's rbindlist
(from v1.9.3) which you can get here.
require(data.table) ## 1.9.3
rbindlist(lisnotOK, fill=TRUE)
# a b c d
# 1: 1 2 hi NA
# 2: NA 2 hello nope
It works on list-of-lists (as in this question), data.frames and data.tables.
If not this, then I'd go with Ananda's list2mat
function (if your types are all identical).
Benchmarks on Ananda's L2
data:
fun1 <- function(inList) ldply(inList, as.data.frame)
fun2 <- function(inList) list2mat(inList)
fun3 <- function(inList) rbindlist(inList, fill=TRUE)
fun4 <- function(inList) rbind_all(lapply(inList, as.data.frame))
microbenchmark(fun1(L2), fun2(L2), fun3(L2), fun4(L2), times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# fun1(L2) 1927.857847 2161.432665 2221.999940 2276.241366 2366.649614 10
# fun2(L2) 12.039652 12.167613 12.361629 12.483751 16.040885 10
# fun3(L2) 1.225929 1.374395 1.473621 1.510876 1.858597 10
# fun4(L2) 1435.153576 1457.053482 1492.334965 1548.547706 1630.443430 10
Note: I've used as.data.frame(.)
instead of data.frame(.)
(former is slightly faster).
Considering that you are OK with the resulting matrix being all of the same type (say, character
), you can try to write your own function, like this:
list2mat <- function(inList) {
UL <- unlist(inList)
Nam <- unique(names(UL))
M <- matrix(NA_character_,
nrow = length(inList), ncol = length(Nam),
dimnames = list(NULL, Nam))
Row <- rep(seq_along(inList), sapply(inList, length))
Col <- match(names(UL), Nam)
M[cbind(Row, Col)] <- UL
M
}
Usage would be:
list2mat(lisnotOK)
# a b c d
# [1,] "1" "2" "hi" NA
# [2,] NA "2" "hello" "nope"
This should be pretty fast since everything is pre-allocated and you are making use of matrix indexing.
fun1 <- function(inList) ldply(inList, data.frame)
fun2 <- function(inList) list2mat(inList)
library(microbenchmark)
microbenchmark(fun1(lisnotOK), fun2(lisnotOK))
# Unit: microseconds
# expr min lq median uq max neval
# fun1(lisnotOK) 4193.808 4340.0585 4523.3000 4912.233 7600.341 100
# fun2(lisnotOK) 163.784 182.3865 211.2515 236.910 363.489 100
L2 <- unlist(replicate(1000, lisnotOK, simplify=FALSE), recursive=FALSE)
microbenchmark(fun1(L2), fun2(L2), times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# fun1(L2) 3032.71572 3106.79006 3196.17178 3306.11756 3609.67445 10
# fun2(L2) 24.16817 24.86991 25.65569 27.44128 29.41908 10
Use lapply
to convert your list elements to data.frame
s and rbind_all
that:
rbind_all(lapply(lisnotOK,data.frame))
a b c d
1 1 2 hi <NA>
2 NA 2 hello nope
Warning message:
In rbind_all(lapply(lisnotOK, data.frame)) :
Unequal factor levels: coercing to character
Or from plyr
, ldply
with data.frame
:
ldply(lisnotOK,data.frame)
a b c d
1 1 2 hi <NA>
2 NA 2 hello nope