Very often I want to convert a list wherein each index has identical element types to a data frame. For example, I may have a list:
> my.list
[[1]]
[[1]]
I can't tell you this is the "most efficient" in terms of memory or speed, but it's pretty efficient in terms of coding:
my.df <- do.call("rbind", lapply(my.list, data.frame))
the lapply() step with data.frame() turns each list item into a single row data frame which then acts nice with rbind()
Although this question has long since been answered, it's worth pointing out the data.table
package has rbindlist
which accomplishes this task very quickly:
library(microbenchmark)
library(data.table)
l <- replicate(1E4, list(a=runif(1), b=runif(1), c=runif(1)), simplify=FALSE)
microbenchmark( times=5,
R=as.data.frame(Map(f(l), names(l[[1]]))),
dt=data.frame(rbindlist(l))
)
gives me
Unit: milliseconds
expr min lq median uq max neval
R 31.060119 31.403943 32.278537 32.370004 33.932700 5
dt 2.271059 2.273157 2.600976 2.635001 2.729421 5
Not sure where they rank as far as efficiency, but depending on the structure of your lists there are some tidyverse
options. A bonus is that they work nicely with unequal length lists:
l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
, b = list(var.1 = 4, var.2 = 5)
, c = list(var.1 = 7, var.3 = 9)
, d = list(var.1 = 10, var.2 = 11, var.3 = NA))
df <- dplyr::bind_rows(l)
df <- purrr::map_df(l, dplyr::bind_rows)
df <- purrr::map_df(l, ~.x)
# all create the same data frame:
# A tibble: 4 x 3
var.1 var.2 var.3
<dbl> <dbl> <dbl>
1 1 2 3
2 4 5 NA
3 7 NA 9
4 10 11 NA
And you can also mix vectors and data frames:
library(dplyr)
bind_rows(
list(a = 1, b = 2),
data_frame(a = 3:4, b = 5:6),
c(a = 7)
)
# A tibble: 4 x 2
a b
<dbl> <dbl>
1 1 2
2 3 5
3 4 6
4 7 NA
The dplyr package's bind_rows
is efficient.
one <- mtcars[1:4, ]
two <- mtcars[11:14, ]
system.time(dplyr::bind_rows(one, two))
user system elapsed
0.001 0.000 0.001
I think you want:
> do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE))
global_stdev_ppb range tok global_freq_ppb
1 24267673 0.03114799 hello 211592.6
2 11561448 0.08870838 world 1002043.0
> str(do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE)))
'data.frame': 2 obs. of 4 variables:
$ global_stdev_ppb: num 24267673 11561448
$ range : num 0.0311 0.0887
$ tok : chr "hello" "world"
$ global_freq_ppb : num 211593 1002043
Another option is:
data.frame(t(sapply(mylist, `[`)))
but this simple manipulation results in a data frame of lists:
> str(data.frame(t(sapply(mylist, `[`))))
'data.frame': 2 obs. of 3 variables:
$ a:List of 2
..$ : num 1
..$ : num 2
$ b:List of 2
..$ : num 2
..$ : num 3
$ c:List of 2
..$ : chr "a"
..$ : chr "b"
An alternative to this, along the same lines but now the result same as the other solutions, is:
data.frame(lapply(data.frame(t(sapply(mylist, `[`))), unlist))
[Edit: included timings of @Martin Morgan's two solutions, which have the edge over the other solution that return a data frame of vectors.] Some representative timings on a very simple problem:
mylist <- list(list(a = 1, b = 2, c = "a"), list(a = 2, b = 3, c = "b"))
> ## @Joshua Ulrich's solution:
> system.time(replicate(1000, do.call(rbind, lapply(mylist, data.frame,
+ stringsAsFactors=FALSE))))
user system elapsed
1.740 0.001 1.750
> ## @JD Long's solution:
> system.time(replicate(1000, do.call(rbind, lapply(mylist, data.frame))))
user system elapsed
2.308 0.002 2.339
> ## my sapply solution No.1:
> system.time(replicate(1000, data.frame(t(sapply(mylist, `[`)))))
user system elapsed
0.296 0.000 0.301
> ## my sapply solution No.2:
> system.time(replicate(1000, data.frame(lapply(data.frame(t(sapply(mylist, `[`))),
+ unlist))))
user system elapsed
1.067 0.001 1.091
> ## @Martin Morgan's Map() sapply() solution:
> f = function(x) function(i) sapply(x, `[[`, i)
> system.time(replicate(1000, as.data.frame(Map(f(mylist), names(mylist[[1]])))))
user system elapsed
0.775 0.000 0.778
> ## @Martin Morgan's Map() lapply() unlist() solution:
> f = function(x) function(i) unlist(lapply(x, `[[`, i), use.names=FALSE)
> system.time(replicate(1000, as.data.frame(Map(f(mylist), names(mylist[[1]])))))
user system elapsed
0.653 0.000 0.658