Count occurence of values for every possible pair

后端未结

关注

 3  823

I have a list of ids and places where these ids have been. Now I want to find pairs of ids that have most places in common.

My data frame looks like this:

相关标签:

3条回答

小鲜肉

2021-01-22 05:06

Or...

library(dplyr)

df1 %>%
  left_join(df1, by = "place") %>%
  filter(id.x < id.y) %>%
  group_by(id.x, id.y) %>%
  summarise(count = n())

EDIT: If IDs are factors operator < won't work. Conversion adds another line to the solution (credits to Steven Beaupré):

df1 %>%
  left_join(df1, by = "place") %>%
  mutate_each(funs(as.character(.))) %>%
  filter(id.x < id.y) %>%
  group_by(id.x, id.y) %>%
  summarise(count = n())

0 讨论(0)

时光说笑

2021-01-22 05:23

For a dplyr-esque solution,

You could do:

left_join(df, df, by = "place") %>%
  rename(pair1 = id.x, pair2 = id.y) %>%
  filter(!pair1 == pair2, !duplicated(t(apply(., 1, sort))) == TRUE) %>% 
  count(pair1, pair2)

0 讨论(0)

误落风尘

2021-01-22 05:30

You may try

library(reshape2)
tbl <-  crossprod(table(df1[2:1]))
tbl[upper.tri(tbl, diag=TRUE)] <- 0
res <- subset(melt(tbl), value!=0)
colnames(res) <- c(paste0('pair',1:2), 'count')
row.names(res) <- NULL
res
#   pair1 pair2 count
#1    Joe  Dave     1
#2 Stuart  Dave     3
#3 Stuart   Joe     2

Or another option is

Subdf <- subset(merge(df1, df1, by.x='place',
               by.y='place'), id.x!=id.y)
Subdf[-1] <- t(apply(Subdf[-1], 1, sort))
aggregate(place~., unique(Subdf), FUN=length)
#  id.x   id.y place
#1 Dave    Joe     1
#2 Dave Stuart     3
#3  Joe Stuart     2

0 讨论(0)