Given a vector std::vector
, we can find unique elements efficiently by:
std::vector uv(v.begin(), v.end());
std::
An O(n) solution when the range of values is limited, for example chars. Using less than the CPU level 1 cache for the counter leaves room for other values.
(untested code)
constexp int ProblemSize = 256;
using CountArray = std::array<int, ProblemSize>;
CountArray CountUnique(const std::vector<char>& vec) {
CountArray count;
for(const auto ch : vec)
count[ch]++;
return count;
}
using equal_range
:
std::vector<int> results;
for(auto i = begin(v); i != end(v);)
{
auto r = std::equal_range(i, end(v), *i);
results.emplace_back( std::distance(r.first, r.second) );
i = r.second;
}
SSCCE:
#include <vector>
#include <algorithm>
#include <iostream>
#include <iterator>
int main()
{
std::vector<double> v{1.0, 2.0, 1.0, 2.0, 1.0, 3.0};
std::sort(begin(v), end(v));
std::vector<int> results;
for(auto i = begin(v); i != end(v);)
{
auto r = std::equal_range(i, end(v), *i);
results.emplace_back( std::distance(r.first, r.second) );
i = r.second;
}
for(auto const& e : results) std::cout << e << "; ";
}
First, note that ==
and to a lesser extent <
on double
is often a poor idea: often you'll have values that logically "should" be equal if the double
was infinite precision, but are slightly different.
However, collecting the frequencies is easy:
template<typename T, typename Allocator>
std::unordered_map< T, std::size_t > frequencies( std::vector<T, Allocator> const& src ) {
std::unordered_map< T, std::size_t > retval;
for (auto&& x:src)
++retval[x];
return retval;
}
assuming std::hash<T>
is defined (which it is for double
). If not, there is more boilerplate, so I'll skip it. Note that this does not care if the vector
is sorted.
If you want it in the form of std::vector<std::size_t>
in sync with your sorted vector
, you can just do this:
template<typename T, typename Hash, typename Equality, typename Allocator>
std::vector<std::size_t> collate_frequencies(
std::vector<T, Allocator> const& order,
std::unordered_map<T, std::size_t, Hash, Equality> const& frequencies
) {
std::vector<std::size_t> retval;
retval.reserve(order.size());
for( auto&& x : order )
retval.push_back( frequencies[x] );
return retval;
}
I took the liberty of making these functions overly generic, so they support more than just double
s.
After you sort, before you erase:
std::vector<int> freq_uv;
freq_uv.push_back(0);
auto prev = uv[0]; // you should ensure !uv.empty() if previous code did not already ensure it.
for (auto const & x : uv)
{
if (prev != x)
{
freq_uv.push_back(0);
prev = x;
}
++freq_uv.back();
}
Note that, while I generally like to count occurences with a map, as Yakk is doing, in this case I think it is doing a lot of unnecessary work as we already know the vector is sorted.
Another possibility is to use a std::map
(not unordered), instead of sorting. This will get your frequencies first. Then, since the map is ordered, you can just create the sorted, unique vector, and the frequency vector directly from the map.
// uv not yet created
std::map<T, int> freq_map;
for (auto const & x : v)
++freq_map[x];
std::vector<T> uv;
std::vector<int> freq_uv;
for (auto const & p : freq_map)
{
uv.push_back(p.first);
freq_uv.push_back(p.second);
}