问题
I've got a nested list of lists that I'd like to flatten into a dataframe with id variables so I know which list elements (and sub-list elements) each came from.
> str(gc_all)
List of 3
$ 1: num [1:102, 1:2] -74 -73.5 -73 -72.5 -71.9 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "lon" "lat"
$ 2: num [1:102, 1:2] -74 -73.3 -72.5 -71.8 -71 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "lon" "lat"
$ 3:List of 2
..$ : num [1:37, 1:2] -74 -74.4 -74.8 -75.3 -75.8 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "lon" "lat"
..$ : num [1:65, 1:2] 180 169 163 158 154 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "lon" "lat"
I've used plyr::ldply(mylist, rbind)
for flattening lists before, but I seem to be encountering trouble due to variable list lengths: some list elements contain only one dataframe, whilst others contain a list of two dataframes.
I've found a clunky solution using two lapply
s and an ifelse
like so:
# sample latitude-longitude data
df <- data.frame(source_lat = rep(40.7128, 3),
source_lon = rep(-74.0059, 3),
dest_lat = c(55.7982, 41.0082, -7.2575),
dest_lon = c(37.968, 28.9784, 112.7521),
id = 1:3)
# split into list
gc_list <- split(df, df$id)
# get great circles between lat-lon for each id; multiple list elements are outputted when the great circle crosses the dateline
gc_all <- lapply(gc_list, function(x) {
geosphere::gcIntermediate(x[, c("source_lon", "source_lat")],
x[, c("dest_lon", "dest_lat")],
n = 100, addStartEnd=TRUE, breakAtDateLine=TRUE)
})
gc_fortified <- lapply(1:length(gc_all), function(i) {
if(class(gc_all[[i]]) == "list") {
lapply(1:length(gc_all[[i]]), function(j) {
data.frame(gc_all[[i]][[j]], id = i, section = j)
}) %>%
plyr::rbind.fill()
} else {
data.frame(gc_all[[i]], id = i, section = 1)
}
}) %>%
plyr::rbind.fill()
But I feel like there must be a more elegant solution that works as a one-liner, e.g. dput
, data.table
?
Here's what I expect the output to look like:
> gc_fortified %>%
group_by(id, section) %>%
slice(1)
lon lat id section
<dbl> <dbl> <int> <dbl>
1 -74.0059 40.71280 1 1
2 -74.0059 40.71280 2 1
3 -74.0059 40.71280 3 1
4 180.0000 79.70115 3 2
回答1:
First the structure of the list needs to be reworked so it becomes a regular list of lists, then we apply map_dfr
two times, using the .id
parameter.
library(purrr)
gc_all_df <- map(map_if(gc_all,~class(.x)=="matrix",list),~map(.x,as.data.frame))
map_dfr(gc_all_df,~map_dfr(.x,identity,.id="id2"),identity,.id="id1")
回答2:
I think I prefer the recursive solution already shown but this is one statement of the form do.call("rbind", ...)
as requested, if you substitute L
and add_n_s
into the last line. I have kept them separate here only for clarity.
I have left the result as a matrix since the result is entirely numeric and I suspect that it is not that you prefer data frames but that rbind.fill
works on them and that was what you were using. Replace cbind
in the add_n_s
function with data.frame
if you prefer a data frame result.
No packages are used and the solution does not use any indexing.
Here gc_all
is transformed to L
which is the same except that it is a list of lists and not a list of a mix of matrices and lists. add_n_s
takes an element of L
and adds n
and s
columns to it. Finally we Map add_n_s
across L
and flatten.
Note that if the input had been a list of lists in the first place then L
would equal gc_all
and the first line would not have been needed.
L <- lapply(gc_all, function(x) if (is.list(x)) x else list(x))
add_n_s <- function(x, n) Map(cbind, x, n = n, s = seq_along(x))
do.call("rbind", do.call("c", Map(add_n_s, L, seq_along(gc_all))))
Update fixed.
回答3:
I can't offer a one-liner, but you could consider recursion here too
flat <- function(l, s = NULL) {
lapply(1:length(l), function(i) {
if (is.list(l[[i]])) {
do.call(rbind, flat(l[[i]], i))
} else {
cbind(l[[i]], id = if (is.null(s)) i else s, section = if (is.null(s)) 1 else i)
}
})
}
a <- do.call(rbind, flat(gc_all))
all.equal(data.frame(a), gc_fortified)
[1] TRUE
来源:https://stackoverflow.com/questions/48542874/flatten-nested-list-of-lists-with-variable-numbers-of-elements-to-a-data-frame