I have a large number of sets of numbers. Each set contains 10 numbers and I need to remove all sets that have 5 or more number (unordered) matches with any other set.
F
Maybe you need an algorithm such like this (as I understand your problem)?
import java.util.Arrays;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;
/**
* @author karnokd, 2009.06.28.
* @version $Revision 1.0$
*/
public class NoOverlappingSets {
// because of the shortcomings of java type inference, O(N)
public static Set setOf(Integer... values) {
return new HashSet(Arrays.asList(values));
}
// the test function, O(N)
public static boolean isNumberOfDuplicatesAboveLimit(
Set first, Set second, int limit) {
int result = 0;
for (Integer i : first) {
if (second.contains(i)) {
result++;
if (result >= limit) {
return true;
}
}
}
return false;
}
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
List> sets = new LinkedList>() {{
add(setOf(12,14,222,998,1,89,43,22,7654,23));
add(setOf(44,23,64,76,987,3,2345,443,431,88));
add(setOf(998,22,7654,345,112,32,89,9842,31,23));
}};
List> resultset = new LinkedList>();
loop:
for (Set curr : sets) {
for (Set existing : resultset) {
if (isNumberOfDuplicatesAboveLimit(curr, existing, 5)) {
continue loop;
}
}
// no overlapping with the previous instances
resultset.add(curr);
}
System.out.println(resultset);
}
}
I'm not an expert in Big O notation but I think this algorithm is O(N*M^2) where N is the number of elements in the set and M is the total number of sets (based on the number of loops I used in the algorithm). I took the liberty of defining what I consider overlapping sets.
I think your problem is Polinomial. As I remember my lectures, the decision based version would be NP-hard - but correct me if I'm wrong.