Filtering with a set of values in r (dplyr)

风流意气都作罢 提交于 2021-02-08 08:16:25

问题


I have a question regarding filtering using the dplyr package in R.

I have a current dataframe as follows:

  url                       season    salary
   <fct>                     <fct>      <dbl>
 1 /players/a/abrinal01.html 2016-17  5725000
 2 /players/a/ackeral01.html 2008-09   711517
 3 /players/a/acyqu01.html   2012-13   788872
 4 /players/a/acyqu01.html   2013-14   915243
 5 /players/a/acyqu01.html   2014-15   981348
 6 /players/a/acyqu01.html   2015-16  1914544
 7 /players/a/acyqu01.html   2016-17  1709538
 8 /players/a/adamsjo01.html 2014-15  1404600
 9 /players/a/adamsst01.html 2014-15  3140517
10 /players/a/adamsst01.html 2016-17 22471910
11 /players/a/adamsst01.html 2017-18 2571910

I would like to group by URL and only keep those rows which contain URLs that played in seasons 2012-2013, 2013-2014 and 2014-2015 only.

I have tried this, but it gives an error :

Error in filter_impl(.data, quo) : Result must have length 1, not 3

p_filter <- p_g_stagger %>% 
  dplyr :: group_by(url) %>%
  dplyr :: filter(season == c('2012-13', '2013-14', '2014-15'))

My desired output is this:

       url                       season    salary
       <fct>                     <fct>      <dbl>
     1 /players/a/acyqu01.html   2012-13   788872
     2 /players/a/acyqu01.html   2013-14   915243
     3 /players/a/acyqu01.html   2014-15   981348

回答1:


We need two conditions in filter

1) Filters only the groups (url) which has all the season_needed

2) Filters only the season_needed from those selected groups in condition 1.

season_needed <- c('2012-13', '2013-14', '2014-15')
library(dplyr)

df %>%
  group_by(url) %>%
  filter(all(season_needed %in% season) & season %in% season_needed)

#  url                     season  salary
#  <fct>                   <fct>    <int>
#1 /players/a/acyqu01.html 2012-13 788872
#2 /players/a/acyqu01.html 2013-14 915243
#3 /players/a/acyqu01.html 2014-15 981348



回答2:


another approach, usingadd_count.

seasons_in <- c('2012-13', '2013-14', '2014-15')

p_g_stagger %>% 
  filter(season %in% seasons_in) %>% 
  add_count(url, name = "nb_seasons") %>% 
  filter(nb_seasons == length(seasons_in)) %>% 
  select(-nb_seasons)


来源:https://stackoverflow.com/questions/55037179/filtering-with-a-set-of-values-in-r-dplyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!