Quickest algorithm for finding sets with high intersection

后端 未结 4 496
予麋鹿
予麋鹿 2021-02-01 06:37

I have a large number of user IDs (integers), potentially millions. These users all belong to various groups (sets of integers), such that there are on the order of 10 million

4条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-01 07:09

    I would do exactly what you propose: map users to their group. That is, I would keep a list of group ids for every user. Then I would use the following algorithm:

    foreach group:
      map = new Map  // maps groups to count
      foreach user in group:
        foreach userGroup in user.groups:
          map[userGroup]++
          if( map[userGroup] == 15 && userGroup.id > group.id )
            largeIntersection( group, userGroup )
    

    Given you have G groups each containing U users on average, and given that these users belong to g groups on average, then this will run in O( G*U*g ). Which, given your problem, is probably much faster than the naive pairwise comparison of groups which runs in O(G*G*U).

提交回复
热议问题