How to create a column/index based on either of two conditions being met (to enable clustering of matched pairs within same dataframe)?

问题

I have a large dataset of matched pairs (id1 and id2) and would like to create an index variable to enable me to merge these pairs into rows.

As such, the first row would be index 1 and from then on the index will increase by 1, unless either id1 or id2 match any of the values in previous rows. Where this is the case, the previously attributed index should be applied.

I have looked for weeks and most solutions seem to fall short of what I need.

Here's some data to replicate what I have:

id1 <- c(1,2,2,4,6,7,9,11)
id2 <- c(2,3,4,5,7,8,10,2)
df <- cbind(id1,id2)
df <- as.data.frame(df)

df
  id1 id2
1   1   2
2   2   3
3   2   4
4   4   5
5   6   7
6   7   8
7   9  10
8  11   2

And here's what hope to achieve:

#wanted result
index <- c(1,1,1,1,2,2,3,1)
df_indexed <- cbind(df,index)

df_indexed
  id1 id2 index
1   1   2     1
2   2   3     1
3   2   4     1
4   4   5     1
5   6   7     2
6   7   8     2
7   9  10     3
8  11   2     1

回答1:

It may be easier to do in igraph

library(igraph)
g <- graph.data.frame(df)
df$index <- clusters(g)$membership[as.character(df$id1)]
df$index
#[1] 1 1 1 1 2 2 3 1

来源：https://stackoverflow.com/questions/56739733/how-to-create-a-column-index-based-on-either-of-two-conditions-being-met-to-ena

标签

indexing

data-manipulation

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!