Let\'s say I have a dataset contain visits in a hospital. My goal is to generate a variable that counts the number of unique patients the visitor has seen before at the date
You can do:
with(df, ave(patient, visitor, FUN = function(x) cumsum(!duplicated(x)))) [1] 1 1 1 2 2 2 2 2 3 3
Essentially, it is a cumulative sum of non-duplicated values per group.
And you can also do the same with dplyr:
dplyr
df %>% group_by(visitor) %>% mutate(res = cumsum(!duplicated(patient)))