R: Cumulatively count number of times column value appears in other column

北慕城南 提交于 2019-12-06 14:16:11

There might be more elegant ways to do it, but this gets the job done. The key here is the split<- function.

df$count <- NA # This column must be added prior to calling `split<-`
               # because otherwise we can't assign values to it
split(df, df$var) <- lapply(split(df, df$var), function(x){
    x$count <- cumsum(sapply(1:nrow(x), function(i) x$id2[i] %in% x$id1[1:i]))
    x
})

The result is the following. There are some discrepancies, so either you made some errors in your manual construction of the desired results or I have misunderstood the question.

  id1 id2 var count
1   1   2   a     0
2   2   3   b     0
3   2   1   a     1
4   3   2   a     2
5   2   3   a     3
6   4   2   a     4
7   3   1   b     0

Update:

Just to make this answer complete and working, this is my take on your solution. Essentially the same, but I think it's nicer and more readable to have the ave inside the lapply.

df$count <- NA
split(df, df$var) <- lapply(split(df, df$var), function(x){
    hit <- sapply(1:nrow(x), function(i) x$id2[i] %in% x$id1[1:i])
    x$count <- ave(hit, x$id2, FUN=cumsum)
    x
})

Have used and edited Backlin's answer to get what I want, code is as follows

df$count<- NA 

split(df, df$var) <- lapply(split(df, df$var), function(x){
    x$count<- sapply(1:nrow(x), function(i) sum(x$id2[i] == x$id1[1:i]))
    x
})

There is probably a more elegant way of doing it but I think this works...

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!