问题
I have a large dataset of matched pairs (id1 and id2) and would like to create an index variable to enable me to merge these pairs into rows.
As such, the first row would be index 1 and from then on the index will increase by 1, unless either id1 or id2 match any of the values in previous rows. Where this is the case, the previously attributed index should be applied.
I have looked for weeks and most solutions seem to fall short of what I need.
Here's some data to replicate what I have:
id1 <- c(1,2,2,4,6,7,9,11)
id2 <- c(2,3,4,5,7,8,10,2)
df <- cbind(id1,id2)
df <- as.data.frame(df)
df
id1 id2
1 1 2
2 2 3
3 2 4
4 4 5
5 6 7
6 7 8
7 9 10
8 11 2
And here's what hope to achieve:
#wanted result
index <- c(1,1,1,1,2,2,3,1)
df_indexed <- cbind(df,index)
df_indexed
id1 id2 index
1 1 2 1
2 2 3 1
3 2 4 1
4 4 5 1
5 6 7 2
6 7 8 2
7 9 10 3
8 11 2 1
回答1:
It may be easier to do in igraph
library(igraph)
g <- graph.data.frame(df)
df$index <- clusters(g)$membership[as.character(df$id1)]
df$index
#[1] 1 1 1 1 2 2 3 1
来源:https://stackoverflow.com/questions/56739733/how-to-create-a-column-index-based-on-either-of-two-conditions-being-met-to-ena