I have a vector containing few non-adjacent duplicates.
As a simple example, consider:
2 1 6 1 4 6 2 1 1
I am trying to make this
There is a nice article by John Torjo which deals with this very question in a systematic way. The result he comes up with seems more generic and more efficient than any of the solutions suggested here so far:
http://www.builderau.com.au/program/java/soa/C-Removing-duplicates-from-a-range/0,339024620,320271583,00.htm
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1052159.html
Unfortunately, the complete code of John's solution seems to be no longer available, and John did not respond to may email. Therefore, I wrote my own code which is based on similar grounds like his, but intentionally differs in some details. Feel free to contact me (vschoech think-cell com) and discuss the details if you wish.
To make the code compile for you, I added some of my own library stuff which I use regularly. Also, instead of going with plain stl, I use boost a lot to create more generic, more efficient, and more readable code.
Have fun!
#include <vector>
#include <functional>
#include <boost/bind.hpp>
#include <boost/range.hpp>
#include <boost/iterator/counting_iterator.hpp>
/////////////////////////////////////////////////////////////////////////////////////////////
// library stuff
template< class Rng, class Func >
Func for_each( Rng& rng, Func f ) {
return std::for_each( boost::begin(rng), boost::end(rng), f );
};
template< class Rng, class Pred >
Rng& sort( Rng& rng, Pred pred ) {
std::sort( boost::begin( rng ), boost::end( rng ), pred );
return rng; // to allow function chaining, similar to operator+= et al.
}
template< class T >
boost::iterator_range< boost::counting_iterator<T> > make_counting_range( T const& tBegin, T const& tEnd ) {
return boost::iterator_range< boost::counting_iterator<T> >( tBegin, tEnd );
}
template< class Func >
class compare_less_impl {
private:
Func m_func;
public:
typedef bool result_type;
compare_less_impl( Func func )
: m_func( func )
{}
template< class T1, class T2 > bool operator()( T1 const& tLeft, T2 const& tRight ) const {
return m_func( tLeft ) < m_func( tRight );
}
};
template< class Func >
compare_less_impl<Func> compare_less( Func func ) {
return compare_less_impl<Func>( func );
}
/////////////////////////////////////////////////////////////////////////////////////////////
// stable_unique
template<class forward_iterator, class predicate_type>
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd, predicate_type predLess) {
typedef std::iterator_traits<forward_iterator>::difference_type index_type;
struct SIteratorIndex {
SIteratorIndex(forward_iterator itValue, index_type idx) : m_itValue(itValue), m_idx(idx) {}
std::iterator_traits<forward_iterator>::reference Value() const {return *m_itValue;}
index_type m_idx;
private:
forward_iterator m_itValue;
};
// {1} create array of values (represented by iterators) and indices
std::vector<SIteratorIndex> vecitidx;
vecitidx.reserve( std::distance(itBegin, itEnd) );
struct FPushBackIteratorIndex {
FPushBackIteratorIndex(std::vector<SIteratorIndex>& vecitidx) : m_vecitidx(vecitidx) {}
void operator()(forward_iterator itValue) const {
m_vecitidx.push_back( SIteratorIndex(itValue, m_vecitidx.size()) );
}
private:
std::vector<SIteratorIndex>& m_vecitidx;
};
for_each( make_counting_range(itBegin, itEnd), FPushBackIteratorIndex(vecitidx) );
// {2} sort by underlying value
struct FStableCompareByValue {
FStableCompareByValue(predicate_type predLess) : m_predLess(predLess) {}
bool operator()(SIteratorIndex const& itidxA, SIteratorIndex const& itidxB) {
return m_predLess(itidxA.Value(), itidxB.Value())
// stable sort order, index is secondary criterion
|| !m_predLess(itidxB.Value(), itidxA.Value()) && itidxA.m_idx < itidxB.m_idx;
}
private:
predicate_type m_predLess;
};
sort( vecitidx, FStableCompareByValue(predLess) );
// {3} apply std::unique to the sorted vector, removing duplicate values
vecitidx.erase(
std::unique( vecitidx.begin(), vecitidx.end(),
!boost::bind( predLess,
// redundand boost::mem_fn required to compile
boost::bind(boost::mem_fn(&SIteratorIndex::Value), _1),
boost::bind(boost::mem_fn(&SIteratorIndex::Value), _2)
)
),
vecitidx.end()
);
// {4} re-sort by index to match original order
sort( vecitidx, compare_less(boost::mem_fn(&SIteratorIndex::m_idx)) );
// {5} keep only those values in the original range that were not removed by std::unique
std::vector<SIteratorIndex>::iterator ititidx = vecitidx.begin();
forward_iterator itSrc = itBegin;
index_type idx = 0;
for(;;) {
if( ititidx==vecitidx.end() ) {
// {6} return end of unique range
return itSrc;
}
if( idx!=ititidx->m_idx ) {
// original range must be modified
break;
}
++ititidx;
++idx;
++itSrc;
}
forward_iterator itDst = itSrc;
do {
++idx;
++itSrc;
// while there are still items in vecitidx, there must also be corresponding items in the original range
if( idx==ititidx->m_idx ) {
std::swap( *itDst, *itSrc ); // C++0x move
++ititidx;
++itDst;
}
} while( ititidx!=vecitidx.end() );
// {6} return end of unique range
return itDst;
}
template<class forward_iterator>
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd) {
return stable_unique( itBegin, itEnd, std::less< std::iterator_traits<forward_iterator>::value_type >() );
}
void stable_unique_test() {
std::vector<int> vecn;
vecn.push_back(1);
vecn.push_back(17);
vecn.push_back(-100);
vecn.push_back(17);
vecn.push_back(1);
vecn.push_back(17);
vecn.push_back(53);
vecn.erase( stable_unique(vecn.begin(), vecn.end()), vecn.end() );
// result: 1, 17, -100, 53
}
As the question was "is there any STL algorithm...? what is its complexity?" it makes sense to implement the function like std::unique
:
template <class FwdIterator>
inline FwdIterator stable_unique(FwdIterator first, FwdIterator last)
{
FwdIterator result = first;
std::unordered_set<typename FwdIterator::value_type> seen;
for (; first != last; ++first)
if (seen.insert(*first).second)
*result++ = *first;
return result;
}
So this is how std::unique
is implemented plus an extra set. The unordered_set
shall be faster than a regular set
. All elements are removed that compare equal to the element right preceding them (the first element is kept because we cannot unify to nothing). The iterator returned points to the new end within the range [first,last)
.
EDIT: The last sentence means that the container itself is NOT modified by unique
. This can be confusing. The following example actually reduces the container to the unified set.
1: std::vector<int> v(3, 5);
2: v.resize(std::distance(v.begin(), unique(v.begin(), v.end())));
3: assert(v.size() == 1);
Line 1 creates a vector { 5, 5, 5 }
. In line 2 calling unique
returns an iterator to the 2nd element, which is the first element that is not unique. Hence distance
returns 1 and resize
prunes the vector.
My question is:
Is there any STL algorithm which can remove the non-adjacent duplicates from the vector ? what is its complexity?
The STL options are the ones you mentioned: put the items in a std::set
, or call std::sort
, std::unique
and calling erase()
on the container. Neither of these fulfills your requirement of "removing the non-adjacent duplicates and maintaining the order of elements."
So why doesn't the STL offer some other option? No standard library will offer everything for every user's needs. The STL's design considerations include "be fast enough for nearly all users," "be useful for nearly all users," and "provide exception safety as much as possible" (and "be small enough for the Standards Committee" as the library Stepanov originally wrote was much larger, and Stroustrup axed out something like 2/3 of it).
The simplest solution I can think of would look like this:
// Note: an STL-like method would be templatized and use iterators instead of
// hardcoding std::vector<int>
std::vector<int> stable_unique(const std::vector<int>& input)
{
std::vector<int> result;
result.reserve(input.size());
for (std::vector<int>::iterator itor = input.begin();
itor != input.end();
++itor)
if (std::find(result.begin(), result.end(), *itor) == result.end())
result.push_back(*itor);
return result;
}
This solution should be O(M * N) where M is the number of unique elements and N is the total number of elements.
There's no STL algorithm doing what you want preserving the sequence's original order.
You could create a std::set
of iterators or indexes into the vector, with a comparison predicate that uses the referenced data rather than the iterators/indexes to sort stuff. Then you delete everything from the vector that isn't referenced in the set. (Of course, you could just as well use another std::vector
of iterators/indexes, std::sort
and std::unique
that, and use this as a reference as to what to keep.)
As far as i know there is none in stl. Look up reference.