It is probably easier to describe what I want to do using an example... Say I have the following dataframe:
id1 id2 var
1 2 a
2 3 b
2 1 a
3 2 a
2 3 a
4 2 a
3 1 b
Which you can generate as follows
df <- data.frame(id1 = c(1,2,2,3,2,4,3),
id2 = c(2,3,1,2,3,2,1),
var = c('a','b','a','a','a','a','b'))
I want a cumulative count of the number of times id2 has appeared in id1 with the same var, so I would end up with
id1 id2 var count
1 2 a 0
2 3 b 0
2 1 a 1
3 2 a 1
2 3 a 1
4 2 a 2
3 1 b 0
So the count in row 3 is 1 since we see id1 = 1 and var = 'a' once before row 3 (in row 1), then in row 4 the count is also 1 since we see id1 = 2 and var 'a' in row 3 (we only check before row 4 so don't count the one we see in row 5).
If I was checking the number of times id1 had appeared in id1 I would do something like
with(df, ave(id1 == id1, paste(id1, var), FUN = cumsum))
Is there a quick and easy way of doing this for id2?
Thanks in advance
There might be more elegant ways to do it, but this gets the job done. The key here is the split<-
function.
df$count <- NA # This column must be added prior to calling `split<-`
# because otherwise we can't assign values to it
split(df, df$var) <- lapply(split(df, df$var), function(x){
x$count <- cumsum(sapply(1:nrow(x), function(i) x$id2[i] %in% x$id1[1:i]))
x
})
The result is the following. There are some discrepancies, so either you made some errors in your manual construction of the desired results or I have misunderstood the question.
id1 id2 var count
1 1 2 a 0
2 2 3 b 0
3 2 1 a 1
4 3 2 a 2
5 2 3 a 3
6 4 2 a 4
7 3 1 b 0
Update:
Just to make this answer complete and working, this is my take on your solution. Essentially the same, but I think it's nicer and more readable to have the ave
inside the lapply
.
df$count <- NA
split(df, df$var) <- lapply(split(df, df$var), function(x){
hit <- sapply(1:nrow(x), function(i) x$id2[i] %in% x$id1[1:i])
x$count <- ave(hit, x$id2, FUN=cumsum)
x
})
Have used and edited Backlin's answer to get what I want, code is as follows
df$count<- NA
split(df, df$var) <- lapply(split(df, df$var), function(x){
x$count<- sapply(1:nrow(x), function(i) sum(x$id2[i] == x$id1[1:i]))
x
})
There is probably a more elegant way of doing it but I think this works...
来源:https://stackoverflow.com/questions/19491258/r-cumulatively-count-number-of-times-column-value-appears-in-other-column