问题
I'm stuck trying to subset some panel data, i.e. ids within group, using dplyr
.
I want to exact all id
s, within each group, grp
that has a NUM
series with a minimum smaller than 2 and a maximum greater than 2. I've constructed a minimal working example below that should illustrate the issue.
I have been working with filter()
, row_number() == c(1,n())
, and tried to separate it out and merge, i.e. different types of _join
, it back together, but I am stuck and I am now turning to the SO community for help.
What I got
A tibble
like this,
df <- tibble(id = rep(0:1, c(8, 13)), grp = rep(c("01", "02"), c(13, 8)),
NUM = c(-4, -3, -2, -1, 1, 2, 3, 4, -3, -2, -1,
1, 2, -3, -2, -1, 1, 2, 3, 4, 5)) %>% group_by(id, grp)
df %>% print(n=21)
#> # A tibble: 21 x 3
#> # Groups: id, grp [3]
#> id grp NUM
#> <int> <chr> <dbl>
#> 1 0 01 -4
#> 2 0 01 -3
#> 3 0 01 -2
#> 4 0 01 -1
#> 5 0 01 1
#> 6 0 01 2
#> 7 0 01 3
#> 8 0 01 4
#> 9 1 01 -3
#> 10 1 01 -2
#> 11 1 01 -1
#> 12 1 01 1
#> 13 1 01 2
#> 14 1 02 -3
#> 15 1 02 -2
#> 16 1 02 -1
#> 17 1 02 1
#> 18 1 02 2
#> 19 1 02 3
#> 20 1 02 4
#> 21 1 02 5
What I am trying to get / desired outcome
df_out <- tibble(id = rep(0:1, c(9, 8)),
grp = rep(c("01", "02"), c(9, 8)),
NUM = c(-4, -3, -2, -1, 1, 2, 3,
4, 5, -3, -2, -1, 1, 2, 3, 4, 5)) %>% group_by(id, grp)
df_out
#> # A tibble: 17 x 3
#> # Groups: id, grp [3]
#> id grp NUM
#> <int> <chr> <dbl>
#> 1 0 01 -4
#> 2 0 01 -3
#> 3 0 01 -2
#> 4 0 01 -1
#> 5 0 01 1
#> 6 0 01 2
#> 7 0 01 3
#> 8 0 01 4
#> 9 1 02 -3
#> 10 1 02 -2
#> 11 1 02 -1
#> 12 1 02 1
#> 13 1 02 2
#> 14 1 02 3
#> 15 1 02 4
#> 16 1 02 5
回答1:
Like so?
library(dplyr)
filter(df, any(NUM > 2) & any(NUM < -2))
# A tibble: 16 x 3
# Groups: id, grp [2]
id grp NUM
<int> <chr> <dbl>
1 0 01 -4.00
2 0 01 -3.00
3 0 01 -2.00
4 0 01 -1.00
5 0 01 1.00
6 0 01 2.00
7 0 01 3.00
8 0 01 4.00
9 1 02 -3.00
10 1 02 -2.00
11 1 02 -1.00
12 1 02 1.00
13 1 02 2.00
14 1 02 3.00
15 1 02 4.00
16 1 02 5.00
In addition, if one whichs to subset on an exact values, say the first NUM
is -3
and last NUM
is 5
, i.e. row 9-16 in the original data, tThis can be done like this,
df %>%
group_by(id, grp) %>%
mutate(first = first(NUM)
,last = last(NUM)) %>%
filter(-3 == first & 5 == last) %>%
select(-first, -last)
#> # A tibble: 8 x 3
#> # Groups: id, grp [1]
#> id grp NUM
#> <int> <chr> <dbl>
#> 1 1 02 -3
#> 2 1 02 -2
#> 3 1 02 -1
#> 4 1 02 1
#> 5 1 02 2
#> 6 1 02 3
#> 7 1 02 4
#> 8 1 02 5
The above is inspired by this SO answer.
来源:https://stackoverflow.com/questions/48448835/subsetting-panel-data-conditional-on-consecutive-strings-of-length