While looking over some code I found loopy and algorithmically slow implementation of std::set_difference :
for(int i = 0; i < a.size(); i++)
{
iter = std
This one does it in O(N+M), assuming both arrays are sorted.
auto ib = std::begin(two);
auto iter = std::remove_if (
std::begin(one), std::end(one),
[&ib](int x) -> bool {
while (ib != std::end(two) && *ib < x) ++ib;
return (ib != std::end(two) && *ib == x);
});
One solution that comes to mind is combining remove_if
and binary_search
. It's effectively the same as your manual looping solution but might be a bit more "elegant" as it uses more STL features.
sort(begin(b), end(b));
auto iter = remove_if(begin(a), end(a),
[](auto x) {
return binary_search(begin(b), end(b), x);
});
// Now [begin(a), iter) defines a new range, and you can erase them however
// you see fit, based on the type of a.
The current code is quite clear, in that it should be obvious to any programmer what's going on.
The current performance is O(a.size() * b.size()
), which may be pretty bad depending upon the actual sizes.
A more concise and STL-like way to describe it is to use remove_if with a predicate that tells you if a value in in a.
b.erase(std::remove_if(b.begin(), b.end(), [](const auto&x) {
return std::find(a.begin(), a.end(), x) != a.end();
}), b.end());
(Not tested, so I might have made a syntax error.) I used a lambda, but you can create a functor if you're not using a C++11 compiler.
Note that the original code removes just one instance of a value in b
that's also in a
. My solution will remove all instances of such a value from b
.
Note that the find
operation happens again and again, so it's probably better to do that on the smaller vector for better locality of reference.
Sort b
so you can binary search it in order to reduce time complexity. Then use the erase-remove idiom in order to throw away all elements from a
that are contained in b
:
sort( begin(b), end(b) );
a.erase( remove_if( begin(a),end(a),
[&](auto x){return binary_search(begin(b),end(b),x);}), end(a) );
Of course, you can still sacrifice time complexity for simplicity and reduce your code by removing the sort()
and replacing binary_search()
by find()
:
a.erase( remove_if( begin(a),end(a),
[&](auto x){return find(begin(b),end(b),x)!=end(b);}), end(a) );
This is a matter of taste. In both cases you don't need heap allocations. By the way, I'm using lambda auto parameters which are C++14. Some compilers already implement that feature such as clang. If you don't have such a compiler, but only C++11 then replace auto
by the element type of the container.
By the way, this code does not mention any types! You can write a template function so it works for all kind of types. The first variant requires random access iteration of b
while the second piece of code does not require that.
after thinking for a while I thought of this
(note:by answering my own Q im not claiming this is a superior to offered A):
vector<int64_t> a{3,2,7,5,11,13}, b{2,3,13,5};
set<int64_t> bs(b.begin(), b.end());
for (const auto& num: bs)
cout << num << " ";
cout << endl;
for (const auto& num: a)
bs.erase(num);
vector<int64_t> result(bs.begin(), bs.end());
for (const auto& num: result)
cout << num << " ";