问题
I have over a 1000 objects (z
) in R, each containing three dataframes (df1
, df2
, df3
) with different structures.
z1$df1
…z1000$df1
z1$df2
…z1000$df2
z1$df3
…z1000$df3
I created a list of these objects (list1 thus contains z1 thru z1000) and tried to use lapply
to extract one type of dataframe (df2
) for all objects, and then merge them to one single dataframe.
Extraction:
For a single object it would look like this:
df15<- z15$df2 # I transferred the index of z to the extracted df
I tried some code with lapply
, ignoring the transfer of the index (I can create another list for that). However I don’t know what function I should use.
List2 <- lapply(list1, function(x))
I try to avoid using a loop because there's so many and vectorization is so much quicker. I have the idea I'm looking at it from the wrong angle.
Subsequent merging can be done as follows:
merged <- do.call(rbind, list2)
Thanks for any suggestions.
回答1:
One option could be using lapply
to extract data.frame
and then use bind_rows
from dplyr
.
## The data
df1 <- data.frame(id = c(1:10), name = c(LETTERS[1:10]), stringsAsFactors = FALSE)
df2 <- data.frame(id = 11:20, name = LETTERS[11:20], stringsAsFactors = FALSE)
df3 <- data.frame(id = 21:30, name = LETTERS[15:24], stringsAsFactors = FALSE)
df4 <- data.frame(id = 121:130, name = LETTERS[15:24], stringsAsFactors = FALSE)
z1 <- list(df1 = df1, df2 = df2, df3 = df3)
z2 <- list(df1 = df1, df2 = df2, df3 = df3)
z3 <- list(df1 = df1, df2 = df2, df3 = df3)
z4 <- list(df1 = df1, df2 = df2, df3 = df4) #DFs can contain different data
# z <- list(z1, z2, z3, z4)
# Dynamically populate list z with many list object
z <- as.list(mget(paste("z",1:4,sep="")))
df1_all <- bind_rows(lapply(z, function(x) x$df1))
df2_all <- bind_rows(lapply(z, function(x) x$df2))
df3_all <- bind_rows(lapply(z, function(x) x$df3))
## Result for df3_all
> tail(df3_all)
## id name
## 35 125 S
## 36 126 T
## 37 127 U
## 38 128 V
## 39 129 W
## 40 130 X
回答2:
It sounds like you want to pull out all the df1
s and rbind
them together then do the same for the other dataframes. You can use purrr::map_dfr
to extract a column from each element of the list and rowbind them together.
library('tidyverse')
dummy_df <- list(
df1 = iris,
df2 = cars,
df3 = CO2)
list1 <- list(
z1 = dummy_df,
z2 = dummy_df,
z3 = dummy_df)
df1 <- map_dfr(list1, 'df1')
df2 <- map_dfr(list1, 'df2')
df3 <- map_dfr(list1, 'df3')
If you wanted to do it in base R, you can use lapply
.
df1 <- lapply(list1, function(x) x$df1)
df1_merged <- do.call(rbind, df1)
回答3:
Try this:
lapply(list1, "[[", "df2")
or if you want to rbind
them together:
do.call("rbind", lapply(list1, "[[", "df2"))
The row names in the resulting data frame will identify the origin of each row.
No packages are used.
Note
We can use this input to test the code above. BOD
is a built-in data frame:
z <- list(df1 = BOD, df2 = BOD, df3 = BOD)
list1 <- list(z1 = z, z2 = z)
回答4:
THere's also data.table::rbindlist
, which is likely faster than do.call(rbind, lapply(...))
or dplyr::bind_rows
library(data.table)
rbindlist(lapply(list1, "[[", "df2"))
来源:https://stackoverflow.com/questions/48238039/extracting-a-dataframe-from-a-list-over-many-objects