Count occurence of values for every possible pair

后端 未结 3 828
隐瞒了意图╮
隐瞒了意图╮ 2021-01-22 04:59

I have a list of ids and places where these ids have been. Now I want to find pairs of ids that have most places in common.

My data frame looks like this:



        
相关标签:
3条回答
  • 2021-01-22 05:06

    Or...

    library(dplyr)
    
    df1 %>%
      left_join(df1, by = "place") %>%
      filter(id.x < id.y) %>%
      group_by(id.x, id.y) %>%
      summarise(count = n())
    

    EDIT: If IDs are factors operator < won't work. Conversion adds another line to the solution (credits to Steven Beaupré):

    df1 %>%
      left_join(df1, by = "place") %>%
      mutate_each(funs(as.character(.))) %>%
      filter(id.x < id.y) %>%
      group_by(id.x, id.y) %>%
      summarise(count = n())
    
    0 讨论(0)
  • 2021-01-22 05:23

    For a dplyr-esque solution,

    You could do:

    left_join(df, df, by = "place") %>%
      rename(pair1 = id.x, pair2 = id.y) %>%
      filter(!pair1 == pair2, !duplicated(t(apply(., 1, sort))) == TRUE) %>% 
      count(pair1, pair2) 
    
    0 讨论(0)
  • 2021-01-22 05:30

    You may try

    library(reshape2)
    tbl <-  crossprod(table(df1[2:1]))
    tbl[upper.tri(tbl, diag=TRUE)] <- 0
    res <- subset(melt(tbl), value!=0)
    colnames(res) <- c(paste0('pair',1:2), 'count')
    row.names(res) <- NULL
    res
    #   pair1 pair2 count
    #1    Joe  Dave     1
    #2 Stuart  Dave     3
    #3 Stuart   Joe     2
    

    Or another option is

    Subdf <- subset(merge(df1, df1, by.x='place',
                   by.y='place'), id.x!=id.y)
    Subdf[-1] <- t(apply(Subdf[-1], 1, sort))
    aggregate(place~., unique(Subdf), FUN=length)
    #  id.x   id.y place
    #1 Dave    Joe     1
    #2 Dave Stuart     3
    #3  Joe Stuart     2
    
    0 讨论(0)
提交回复
热议问题