efficient algorithm to compare similarity between sets of numbers?

前端 未结 12 655
甜味超标
甜味超标 2021-02-01 11:13

I have a large number of sets of numbers. Each set contains 10 numbers and I need to remove all sets that have 5 or more number (unordered) matches with any other set.

F

12条回答
  •  [愿得一人]
    2021-02-01 11:52

    Since you need to compare all pair of sets, the algorithm is about O(N^2) where N is the size of the set.

    For each comparison, you can do about O(X+Y), where X and Y are the size of two sets, in your case 10 each, so it is constant. But this requires you sort all the sets beforehand, so that adds to O(N*xlgx), again xlgx is constant in your case.

    The linear comparison algorithm for two sets is fairly simple as the sets are sorted now, you can iterating both the sets at the same time. See c++ std::set_intersection for detail.

    The entire algorithm is then O(N^2), which would be pretty slow for 10000 sets.

提交回复
热议问题