efficient algorithm to compare similarity between sets of numbers?

前端 未结 12 660
甜味超标
甜味超标 2021-02-01 11:13

I have a large number of sets of numbers. Each set contains 10 numbers and I need to remove all sets that have 5 or more number (unordered) matches with any other set.

F

12条回答
  •  一整个雨季
    2021-02-01 12:05

    Lets assume you have a class NumberSet which implements your unordered set (and can enumerate ints to get the numbers). You then need the following data structures and algorithm:

    • Map> numberSets
    • Map, int> matchCount
    • Pair is a key object which returns the same hashcode and equality for each instance with the same X and Y (no matter if they are swapped)

    Now for each set to be added/compared do the following (pseudocode!!!):

    for (int number: setToAdd) {
       Set numbers = numberSets.get(number);
       if (numbers == null) {
          numbers = new HashSet();
          numberSets.put(number, numbers);
       } else {
          for (NumberSet numberSet: numbers) {
             Pair pairKey = new Pair(numberSet, setToAdd);
             matchCount.put(pairKey, matchCount.get(pairKey)+1); // make sure to handle null as 0 here in real code ;)
          }
       }
       numbers.add(number);
    }
    

    At any time you can go through the pairs and each which has a count of 5 or greater shows a duplicate.

    Note: removing the sets may be a bad idea, because if A is considered a duplicate of B, and B a duplicate of C, so C doesn't have to be a duplicate of A. So if you remove B, you'd not remove C, and the order in which you add your sets would become important.

提交回复
热议问题