问题
I have created a double nested structure for some data. How can I Access the data on the 2nd Level ( or for that matter the nth Level?)
library(gapminder)
library(purrr)
library(tidyr)
gapminder
nest_data <- gapminder %>% group_by(continent) %>% nest(.key = by_continent)
nest_2<-nest_data %>% mutate(by_continent = map(by_continent, ~.x %>% group_by(country) %>% nest(.key = by_country)))
How can I now get the data for China into a dataframe or tibble from nest_2?
I can get the data for all of Asia, but I'm unable to isolate China.
a<-nest_2[nest_2$continent=="Asia",]$by_continent ##Any better way of isolating Asia from nest_2?
I thought I could then do
b<-a[a$country=="China",]$by_country
But I get the following error
Error in a[a$country == "China", ] : incorrect number of dimensions
> glimpse(a)
List of 1
$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 33 obs. of 2 variables:
..$ country : Factor w/ 142 levels "Afghanistan",..: 1 8 9 19 25 56 59 60 61 62 ...
..$ by_country:List of 33
So my big error was not recognizing that the product was a list, which could be remedied by adding [[1]] in the end. However, I very much liked the solution by @Floo0. I took the liberty of providing a function taking the names of the variables in case the sequence of columns are different from the one provided.
select_unnest <- function(df, listcol, var, var_val){ ###listcol, var and var_val must enclosed by ""
df[[listcol]][df[[var]]==var_val][[1]]
}
nest_2 %>% select_unnest(listcol = "by_continent", var = "continent", var_val = "Asia") %>%
select_unnest(listcol = "by_country", var = "country", var_val = "China")
回答1:
This is a pipe-able (%>%
) base R approach
select_unnest <- function(x, select_val){
x[[2]][x[[1]]==select_val][[1]]
}
nest_2 %>% select_unnest("Asia") %>% select_unnest("China")
Comparing the timings:
Unit: microseconds
min lq mean median uq max neval
aosmith1 3202.105 3354.0055 4045.9602 3612.126 4179.9610 17119.495 100
aosmith2 5797.744 6191.9380 7327.6619 6716.445 7662.6415 24245.779 100
Floo0 227.169 303.3280 414.3779 346.135 400.6735 4804.500 100
Ben Bolker 622.267 720.6015 852.9727 775.172 875.5985 1942.495 100
Code:
microbenchmark::microbenchmark(
{a<-nest_2[nest_2$continent=="Asia",]$by_continent
flatten_df(a) %>%
filter(country == "China") %>%
unnest},
{nest_2 %>%
filter(continent == "Asia") %>%
select(by_continent) %>%
unnest%>%
filter(country == "China") %>%
unnest},
{nest_2 %>% select_unnest("Asia") %>% select_unnest("China")},
{n1 <- nest_2$by_continent[nest_2$continent=="Asia"][[1]]
n2 <- n1 %>% filter(country=="China")
n2$by_country[[1]]}
)
回答2:
Your a
is still a list, which would need to be flattened before you could do more.
You could use flatten_df
, dplyr::filter
, and unnest
:
library(dplyr)
flatten_df(a) %>%
filter(country == "China") %>%
unnest
# A tibble: 12 x 5
country year lifeExp pop gdpPercap
<fctr> <int> <dbl> <int> <dbl>
1 China 1952 44.00000 556263527 400.4486
2 China 1957 50.54896 637408000 575.9870
3 China 1962 44.50136 665770000 487.6740
4 China 1967 58.38112 754550000 612.7057
5 China 1972 63.11888 862030000 676.9001
6 China 1977 63.96736 943455000 741.2375
7 China 1982 65.52500 1000281000 962.4214
8 China 1987 67.27400 1084035000 1378.9040
9 China 1992 68.69000 1164970000 1655.7842
10 China 1997 70.42600 1230075000 2289.2341
11 China 2002 72.02800 1280400000 3119.2809
12 China 2007 72.96100 1318683096 4959.1149
An alternative way to pull out Asia and end up in a situation where you aren't working with a list. This would avoid the need to flatten
later.
asia = nest_2 %>%
filter(continent == "Asia") %>%
select(by_continent) %>%
unnest
回答3:
I don't use purrr
so don't quite understand how you ended up with something this weird/deeply nested (it seems you're following a similar approach to this question; the comments addressed to that question suggest some alternative approaches). I can extract the tibble for China this way, but there must be a better way to do what you're trying to do ...
n1 <- nest_2$by_continent[nest_2$continent=="Asia"][[1]]
n2 <- n1 %>% filter(country=="China")
n2$by_country[[1]]
回答4:
A data.table solution:
DT <- as.data.table(gapminder)
#nest data (starting smallest and working up):
nest_DT <- DT[, list(by_country = list(.SD)), by = .(continent, country)]
nest_2 <- nest_DT[, list(by_continent = list(.SD)), by = .(continent)]
We can now chain together calls of the form [filter, column][[1]]
to get at the nested values
nest_2[continent == "Asia", by_continent][[1]]
country by_country
1: Afghanistan <data.table>
2: Bahrain <data.table>
3: Bangladesh <data.table>
4: Cambodia <data.table>
5: China <data.table>
6: Hong Kong, China <data.table>
7: India <data.table>
8: Indonesia <data.table>
9: Iran <data.table>
10: Iraq <data.table>
11: Israel <data.table>
12: Japan <data.table>
... ... ...
nest_2[continent == "Asia", by_continent][[1]][country == "China", by_country][[1]]
year lifeExp pop gdpPercap
1: 1952 44.00000 556263527 400.4486
2: 1957 50.54896 637408000 575.9870
3: 1962 44.50136 665770000 487.6740
4: 1967 58.38112 754550000 612.7057
5: 1972 63.11888 862030000 676.9001
6: 1977 63.96736 943455000 741.2375
7: 1982 65.52500 1000281000 962.4214
8: 1987 67.27400 1084035000 1378.9040
9: 1992 68.69000 1164970000 1655.7842
10: 1997 70.42600 1230075000 2289.2341
11: 2002 72.02800 1280400000 3119.2809
12: 2007 72.96100 1318683096 4959.1149
回答5:
What you probably need is [[]] operator instead of simple single [].
来源:https://stackoverflow.com/questions/39291360/accessing-nested-lists-in-r