I am using a datatype of std::vector
to store a 2D matrix/array. I would like to determine the unique rows of this matrix. I am l
EDIT: I forgot std::vector already defines operator<
and operator==
so you need not even use that:
template <typename t>
std::vector<std::vector<t> > GetUniqueRows(std::vector<std::vector<t> > input)
{
std::sort(input.begin(), input.end());
input.erase(std::unique(input.begin(), input.end()), input.end());
return input;
}
Use std::unique
in concert with a custom functor which calls std::equal
on the two vectors.
std::unique
requires that the input be sorted first. Use a custom functor calling std::lexicographical_compare
on the two vectors input. If you need to recover the unreordered output, you'll need to store the existing order somehow. This will achieve M*n log n complexity for the sort operation (where M is the length of the inner vectors, n is the number of inner vectors), while the std::unique
call will take m*n
time.
For comparison, both your existing approaches are m*n^2 time.
EDIT: Example:
template <typename t>
struct VectorEqual : std::binary_function<const std::vector<t>&, const std::vector<t>&, bool>
{
bool operator()(const std::vector<t>& lhs, const std::vector<t>& rhs)
{
if (lhs.size() != rhs.size()) return false;
return std::equal(lhs.first(), lhs.second(), rhs.first());
}
};
template <typename t>
struct VectorLess : std::binary_function<const std::vector<t>&, const std::vector<t>&, bool>
{
bool operator()(const std::vector<t>& lhs, const std::vector<t>& rhs)
{
return std::lexicographical_compare(lhs.first(), lhs.second(), rhs.first(), rhs.second());
}
};
template <typename t>
std::vector<std::vector<t> > GetUniqueRows(std::vector<std::vector<t> > input)
{
std::sort(input.begin(), input.end(), VectorLess<t>());
input.erase(std::unique(input.begin(), input.end(), VectorEqual<t>()), input.end());
return input;
}
You should also consider using hashing, it preserves row ordering and could be faster (amortized O(m*n)
if alteration of the original is permitted, O(2*m*n)
if a copy is required) than sort
/unique
-- especially noticeable for large matrices (on small matrices you are probably better off with Billy's solution since his requires no additional memory allocation to keep track of the hashes.)
Anyway, taking advantage of Boost.Unordered, here's what you can do:
#include <vector>
#include <boost/foreach.hpp>
#include <boost/ref.hpp>
#include <boost/typeof/typeof.hpp>
#include <boost/unordered_set.hpp>
namespace boost {
template< typename T >
size_t hash_value(const boost::reference_wrapper< T >& v) {
return boost::hash_value(v.get());
}
template< typename T >
bool operator==(const boost::reference_wrapper< T >& lhs, const boost::reference_wrapper< T >& rhs) {
return lhs.get() == rhs.get();
}
}
// destructive, but fast if the original copy is no longer required
template <typename T>
void uniqueRows_inplace(std::vector<std::vector<T> >& A)
{
boost::unordered_set< boost::reference_wrapper< std::vector< T > const > > unique(A.size());
for (BOOST_AUTO(it, A.begin()); it != A.end(); ) {
if (unique.insert(boost::cref(*it)).second) {
++it;
} else {
A.erase(it);
}
}
}
// returning a copy (extra copying cost)
template <typename T>
void uniqueRows_copy(const std::vector<std::vector<T> > &A,
std::vector< std::vector< T > > &ret)
{
ret.reserve(A.size());
boost::unordered_set< boost::reference_wrapper< std::vector< T > const > > unique;
BOOST_FOREACH(const std::vector< T >& row, A) {
if (unique.insert(boost::cref(row)).second) {
ret.push_back(row);
}
}
}