问题
I have a question regarding filtering using the dplyr
package in R
.
I have a current dataframe as follows:
url season salary
<fct> <fct> <dbl>
1 /players/a/abrinal01.html 2016-17 5725000
2 /players/a/ackeral01.html 2008-09 711517
3 /players/a/acyqu01.html 2012-13 788872
4 /players/a/acyqu01.html 2013-14 915243
5 /players/a/acyqu01.html 2014-15 981348
6 /players/a/acyqu01.html 2015-16 1914544
7 /players/a/acyqu01.html 2016-17 1709538
8 /players/a/adamsjo01.html 2014-15 1404600
9 /players/a/adamsst01.html 2014-15 3140517
10 /players/a/adamsst01.html 2016-17 22471910
11 /players/a/adamsst01.html 2017-18 2571910
I would like to group by URL and only keep those rows which contain URLs that played in seasons 2012-2013, 2013-2014 and 2014-2015 only.
I have tried this, but it gives an error :
Error in filter_impl(.data, quo) : Result must have length 1, not 3
p_filter <- p_g_stagger %>%
dplyr :: group_by(url) %>%
dplyr :: filter(season == c('2012-13', '2013-14', '2014-15'))
My desired output is this:
url season salary
<fct> <fct> <dbl>
1 /players/a/acyqu01.html 2012-13 788872
2 /players/a/acyqu01.html 2013-14 915243
3 /players/a/acyqu01.html 2014-15 981348
回答1:
We need two conditions in filter
1) Filters only the groups (url
) which has all
the season_needed
2) Filters only the season_needed
from those selected groups in condition 1.
season_needed <- c('2012-13', '2013-14', '2014-15')
library(dplyr)
df %>%
group_by(url) %>%
filter(all(season_needed %in% season) & season %in% season_needed)
# url season salary
# <fct> <fct> <int>
#1 /players/a/acyqu01.html 2012-13 788872
#2 /players/a/acyqu01.html 2013-14 915243
#3 /players/a/acyqu01.html 2014-15 981348
回答2:
another approach, usingadd_count
.
seasons_in <- c('2012-13', '2013-14', '2014-15')
p_g_stagger %>%
filter(season %in% seasons_in) %>%
add_count(url, name = "nb_seasons") %>%
filter(nb_seasons == length(seasons_in)) %>%
select(-nb_seasons)
来源:https://stackoverflow.com/questions/55037179/filtering-with-a-set-of-values-in-r-dplyr