问题
It is probably easier to describe what I want to do using an example... Say I have the following dataframe:
id1 id2 var
1 2 a
2 3 b
2 1 a
3 2 a
2 3 a
4 2 a
3 1 b
Which you can generate as follows
df <- data.frame(id1 = c(1,2,2,3,2,4,3),
id2 = c(2,3,1,2,3,2,1),
var = c('a','b','a','a','a','a','b'))
I want a cumulative count of the number of times id2 has appeared in id1 with the same var, so I would end up with
id1 id2 var count
1 2 a 0
2 3 b 0
2 1 a 1
3 2 a 1
2 3 a 1
4 2 a 2
3 1 b 0
So the count in row 3 is 1 since we see id1 = 1 and var = 'a' once before row 3 (in row 1), then in row 4 the count is also 1 since we see id1 = 2 and var 'a' in row 3 (we only check before row 4 so don't count the one we see in row 5).
If I was checking the number of times id1 had appeared in id1 I would do something like
with(df, ave(id1 == id1, paste(id1, var), FUN = cumsum))
Is there a quick and easy way of doing this for id2?
Thanks in advance
回答1:
There might be more elegant ways to do it, but this gets the job done. The key here is the split<-
function.
df$count <- NA # This column must be added prior to calling `split<-`
# because otherwise we can't assign values to it
split(df, df$var) <- lapply(split(df, df$var), function(x){
x$count <- cumsum(sapply(1:nrow(x), function(i) x$id2[i] %in% x$id1[1:i]))
x
})
The result is the following. There are some discrepancies, so either you made some errors in your manual construction of the desired results or I have misunderstood the question.
id1 id2 var count
1 1 2 a 0
2 2 3 b 0
3 2 1 a 1
4 3 2 a 2
5 2 3 a 3
6 4 2 a 4
7 3 1 b 0
Update:
Just to make this answer complete and working, this is my take on your solution. Essentially the same, but I think it's nicer and more readable to have the ave
inside the lapply
.
df$count <- NA
split(df, df$var) <- lapply(split(df, df$var), function(x){
hit <- sapply(1:nrow(x), function(i) x$id2[i] %in% x$id1[1:i])
x$count <- ave(hit, x$id2, FUN=cumsum)
x
})
回答2:
Have used and edited Backlin's answer to get what I want, code is as follows
df$count<- NA
split(df, df$var) <- lapply(split(df, df$var), function(x){
x$count<- sapply(1:nrow(x), function(i) sum(x$id2[i] == x$id1[1:i]))
x
})
There is probably a more elegant way of doing it but I think this works...
来源:https://stackoverflow.com/questions/19491258/r-cumulatively-count-number-of-times-column-value-appears-in-other-column