I know the STL has set_difference
, but I need to just know if 2 set
s are disjoint. I\'ve profiled my code and this is slowing my app down quite a b
Since std::set
is a sorted container, you can compare the set boundaries to see if they possibly intersect, and if so, do the expensive STL operation.
Here's where clam fishing gets serious ...
All the code posted so far tries to re-implement what's already in <algorithm>. Here's the gist of set_intersection
:
template<typename _InputIterator1, typename _InputIterator2,
typename _OutputIterator>
_OutputIterator
set_intersection(_InputIterator1 __first1, _InputIterator1 __last1,
_InputIterator2 __first2, _InputIterator2 __last2,
_OutputIterator __result)
{
while (__first1 != __last1 && __first2 != __last2)
if (*__first1 < *__first2)
++__first1;
else if (*__first2 < *__first1)
++__first2;
else
{
*__result = *__first1;
++__first1;
++__first2;
++__result;
}
return __result;
}
Pretty optimal already except for the assignment to the output iterator, which is unnecessary for finding just the fact whether two sets are joint. This can be used though to flip a flag as follows:
/// fake insert container
template <typename T> struct intersect_flag
{
typedef int iterator;
typedef typename T::const_reference const_reference;
bool flag_; // tells whether given sets intersect
intersect_flag() : flag_( false ) {}
iterator insert( iterator, const_reference )
{
flag_ = true; return 0;
}
};
// ...
typedef std::set<std::string> my_set;
my_set s0, s1;
intersect_flag<my_set> intf;
// ...
std::set_intersection( s0.begin(), s0.end(),
s1.begin(), s1.end(), std::inserter( intf, 0 ));
if ( intf.flag_ ) // sets intersect
{
// ...
This avoids copying the objects from the original sets and lets you reuse STL algorithms.
Use std::set_intersection, and see if the output is empty. You could check to see if the range of the two sets (area covered by the begin and end iterators) overlap first, but I suspect that set_intersection might already do this.
These are all O(n) operations, as is is_disjoint. If O(n) is unacceptable, you'll have to build some side storage to "keep track" of the disjoint-ness of the sets as you add/remove elements. Here, you would pay the overhead at insertion time (by updating your "isdisjoint" structure for each set modification), but is_disjoint could be implemented cheaply. This may or may not be a good tradeoff to make.
It is possible to get O(log(n)) by using the fact that both sets are sorted. Just use std::lower_bound
instead of moving iterator by one.
enum Ordering { EQ = 0, LT = -1, GT = 1 };
template <typename A, typename B>
Ordering compare(const A &a, const B &b)
{
return
a == b ? EQ :
a < b ? LT : GT;
}
template <typename SetA, typename SetB>
bool is_disjoint(const SetA &a, const SetB &b)
{
auto it_a = a.begin();
auto it_b = b.begin();
while (it_a != a.end() && it_b != b.end())
{
switch (compare(*it_a, *it_b))
{
case EQ:
return false;
case LT:
it_a = std::lower_bound(++it_a, a.end(), *it_b);
break;
case GT:
it_b = std::lower_bound(++it_b, b.end(), *it_a);
break;
}
}
return true;
}
You could use set_intersection
and test if the resulting set is empty, but I don't know if that's much faster.
The optimal implementation would stop the testing and return false
as soon as the first equal element is found. I don't know of any ready-made solution for that, though
template<class Set1, class Set2>
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
Set1::const_iterator it, itEnd = set1.end();
for (it = set1.begin(); it != itEnd; ++it)
if (set2.count(*it))
return false;
return true;
}
isn't too complex and should do the job nicely.
EDIT: If you want O(n) performance, use the slightly less compact
template<class Set1, class Set2>
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
Set1::const_iterator it1 = set1.begin(), it1End = set1.end();
if (it1 == it1End)
return true; // first set empty => sets are disjoint
Set2::const_iterator it2 = set2.begin(), it2End = set2.end();
if (it2 == it2End)
return true; // second set empty => sets are disjoint
// first optimization: check if sets overlap (with O(1) complexity)
Set1::const_iterator it1Last = it1End;
if (*--it1Last < *it2)
return true; // all elements in set1 < all elements in set2
Set2::const_iterator it2Last = it2End;
if (*--it2Last < *it1)
return true; // all elements in set2 < all elements in set1
// second optimization: begin scanning at the intersection point of the sets
it1 = set1.lower_bound(*it2);
if (it1 == it1End)
return true;
it2 = set2.lower_bound(*it1);
if (it2 == it2End)
return true;
// scan the (remaining part of the) sets (with O(n) complexity)
for(;;)
{
if (*it1 < *it2)
{
if (++it1 == it1End)
return true;
}
else if (*it2 < *it1)
{
if (++it2 == it2End)
return true;
}
else
return false;
}
}
(modified Graphics Noob's modification further, using only operator <)
Modified hjhill's code to reduce complexity by a factor of O(log n) by getting rid of the count() call.
template<class Set1, class Set2>
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
if(set1.empty() || set2.empty()) return true;
typename Set1::const_iterator
it1 = set1.begin(),
it1End = set1.end();
typename Set2::const_iterator
it2 = set2.begin(),
it2End = set2.end();
if(*it1 > *set2.rbegin() || *it2 > *set1.rbegin()) return true;
while(it1 != it1End && it2 != it2End)
{
if(*it1 == *it2) return false;
if(*it1 < *it2) { it1++; }
else { it2++; }
}
return true;
}
I've complied and tested this code now so it should be good.