Using dplyr to group_by and conditionally mutate only with if (without else) statement

廉价感情. 提交于 2020-08-04 16:48:53

问题


I have a dataframe that I need to group by a combination of columns entries in order to conditionally mutate several columns using only an if statement (without an else condition).

More specifically, I want to sum up the column values of a certain group if they cross a pre-defined threshold, otherwise the values should remain unchanged.

I have tried doing this using both if_else and case_when but these functions require either a "false" argument (if_else) or by default set values that are not matched to NA (case_when):

iris_mutated <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::mutate(Sepal.Length=if_else(sum(Sepal.Length)>250, sum(Sepal.Length)),
                Sepal.Width=if_else(sum(Sepal.Width)>170, sum(Sepal.Width)),
                Petal.Length=if_else(sum(Petal.Length)>70, sum(Petal.Length)),
                Petal.Width=if_else(sum(Petal.Width)>15, sum(Petal.Width)))

iris_mutated <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::mutate(Sepal.Length=case_when(sum(Sepal.Length)>250 ~ sum(Sepal.Length)),
                Sepal.Width=case_when(sum(Sepal.Width)>170 ~ sum(Sepal.Width)),
                Petal.Length=case_when(sum(Petal.Length)>70 ~ sum(Petal.Length)),
                Petal.Width=case_when(sum(Petal.Width)>15 ~ sum(Petal.Width)))

Any ideas how to do this instead?

Edit:

Here is an example for the expected output. The sum of the petal width for all species-wise grouped entries is 12.3 for setosa, 101.3 for virginica and 66.3 for versicolor. If I require that this sum should be at least 15 for the values to be summed up (otherwise the original value should be kept), then I expect the following output (only showing the columns "Petal.Width" and "Species"):

Petal.Width    Species
1           0.2     setosa
2           0.2     setosa
3           0.2     setosa
4           0.2     setosa
5           0.2     setosa
6           0.4     setosa
7           0.3     setosa
8           0.2     setosa
9           0.2     setosa
10          0.1     setosa
#...#
50          0.2     setosa
51          66.3 versicolor
52          66.3 versicolor
53          66.3 versicolor
#...#
100         66.3 versicolor
101         101.3  virginica
102         101.3  virginica
103         101.3  virginica
#...#
150         101.3  virginica

回答1:


I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...

iris_mutated <- iris %>% 
  group_by(Species) %>% 
  mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
                                   T ~ Sepal.Length),
         Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
                                   T ~ Sepal.Width),
         Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
                                   T ~ Petal.Length),
         Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
                                   T ~ Petal.Width))


来源:https://stackoverflow.com/questions/54404043/using-dplyr-to-group-by-and-conditionally-mutate-only-with-if-without-else-sta

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!