Efficiently finding the intersection of a variable number of sets of strings

后端 未结 7 798
旧时难觅i
旧时难觅i 2020-11-27 06:14

I have a variable number of ArrayList\'s that I need to find the intersection of. A realistic cap on the number of sets of strings is probably around 35 but could be more. I

相关标签:
7条回答
  • 2020-11-27 06:31

    The accepted answer is just fine; as an update : since Java 8 there is a slightly more efficient way to find the intersection of two Sets.

    Set<String> intersection = set1.stream()
        .filter(set2::contains)
        .collect(Collectors.toSet());
    

    The reason it is slightly more efficient is because the original approach had to add elements of set1 it then had to remove again if they weren't in set2. This approach only adds to the result set what needs to be in there.

    Strictly speaking you could do this pre Java 8 as well, but without Streams the code would have been quite a bit more laborious.

    If both sets differ considerably in size, you would prefer streaming over the smaller one.

    0 讨论(0)
  • 2020-11-27 06:32

    Set.retainAll() is how you find the intersection of two sets. If you use HashSet, then converting your ArrayLists to Sets and using retainAll() in a loop over all of them is actually O(n).

    0 讨论(0)
  • 2020-11-27 06:32

    One more idea - if your arrays/sets are different sizes, it makes sense to begin with the smallest.

    0 讨论(0)
  • 2020-11-27 06:40

    Sort them (n lg n) and then do binary searches (lg n).

    0 讨论(0)
  • 2020-11-27 06:44

    There is also the static method Sets.intersection(set1, set2) in Google Guava that returns an unmodifiable view of the intersection of two sets.

    0 讨论(0)
  • 2020-11-27 06:50

    You can use single HashSet. It's add() method returns false when the object is alredy in set. adding objects from the lists and marking counts of false return values will give you union in the set + data for histogram (and the objects that have count+1 equal to list count are your intersection). If you throw the counts to TreeSet, you can detect empty intersection early.

    0 讨论(0)
提交回复
热议问题