dplyr override all but the first occurrences of a value within a group

问题

I have a grouped data_frame with a "tag" column taking on values "0" and "1". In each group, I need to find the first occurrence of "1" and change all the remaining occurrences to "0". Is there a way to achieve it in dplyr?

For example, let's take "iris" data and let's add the extra "tag" column:

data(iris)
set.seed(1)
iris$tag <- sample( c(0, 1), 150, replace = TRUE, prob = c(0.8, 0.2))
giris <- iris %>% group_by(Species)

In "giris", in the "setosa" group I need to keep only the first occurrence of "1" (i.e. in 4th row) and set the remaining ones to "0". This seems a bit like applying a mask or something...

Is there a way to do it? I have been experimenting with "which" and "duplicated" but I did not succeed. I have been thinking about filtering the "1"s only, keeping them, then joining with the remaining set, but this seems awkward, especially for a 12GB data set.

Thanks in advance!

回答1:

A dplyr option:

mutate(giris, newcol = as.integer(tag & cumsum(tag) == 1))

mutate(giris, newcol = as.integer(tag & !duplicated(tag)))

Or using data.table, same approach, but modify by reference:

library(data.table)
setDT(giris)
giris[, newcol := as.integer(tag & cumsum(tag) == 1), by = Species]

回答2:

We can try

res <- giris %>%
         group_by(Species) %>% 
         mutate(tag1 = ifelse(cumsum(c(TRUE,diff(tag)<0))!=1, 0, tag))

table(res[c("Species", "tag1")])
#            tag1
#Species      0  1
# setosa     49  1
# versicolor 49  1
# virginica  49  1

来源：https://stackoverflow.com/questions/36079363/dplyr-override-all-but-the-first-occurrences-of-a-value-within-a-group

标签

dplyr

which

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!