I have a large number of sets of numbers. Each set contains 10 numbers and I need to remove all sets that have 5 or more number (unordered) matches with any other set.
F
Lets assume you have a class NumberSet
which implements your unordered set (and can enumerate int
s to get the numbers). You then need the following data structures and algorithm:
Map> numberSets
Map, int> matchCount
Pair
is a key object which returns the same hashcode and equality for each instance with the same X and Y (no matter if they are swapped)Now for each set to be added/compared do the following (pseudocode!!!):
for (int number: setToAdd) {
Set numbers = numberSets.get(number);
if (numbers == null) {
numbers = new HashSet();
numberSets.put(number, numbers);
} else {
for (NumberSet numberSet: numbers) {
Pair pairKey = new Pair(numberSet, setToAdd);
matchCount.put(pairKey, matchCount.get(pairKey)+1); // make sure to handle null as 0 here in real code ;)
}
}
numbers.add(number);
}
At any time you can go through the pairs and each which has a count of 5 or greater shows a duplicate.
Note: removing the sets may be a bad idea, because if A is considered a duplicate of B, and B a duplicate of C, so C doesn't have to be a duplicate of A. So if you remove B, you'd not remove C, and the order in which you add your sets would become important.