Efficient way to find frequencies of each unique value in the std::vector

前端 未结 4 573
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-19 18:58

Given a vector std::vector v, we can find unique elements efficiently by:

std::vector uv(v.begin(), v.end());
std::         


        
相关标签:
4条回答
  • 2021-01-19 19:32

    An O(n) solution when the range of values is limited, for example chars. Using less than the CPU level 1 cache for the counter leaves room for other values.

    (untested code)

    constexp int ProblemSize = 256;
    using CountArray = std::array<int, ProblemSize>;
    
    CountArray CountUnique(const std::vector<char>& vec) {
      CountArray count;
      for(const auto ch : vec)
        count[ch]++;
    
      return count;
    }
    
    0 讨论(0)
  • 2021-01-19 19:37

    using equal_range:

    std::vector<int> results;
    for(auto i = begin(v); i != end(v);)
    {
        auto r = std::equal_range(i, end(v), *i);
        results.emplace_back( std::distance(r.first, r.second) );
        i = r.second;
    }
    

    SSCCE:

    #include <vector>
    #include <algorithm>
    #include <iostream>
    #include <iterator>
    
    int main()
    {
        std::vector<double> v{1.0, 2.0, 1.0, 2.0, 1.0, 3.0};
        std::sort(begin(v), end(v));
    
        std::vector<int> results;
        for(auto i = begin(v); i != end(v);)
        {
            auto r = std::equal_range(i, end(v), *i);
            results.emplace_back( std::distance(r.first, r.second) );
            i = r.second;
        }
    
        for(auto const& e : results) std::cout << e << "; ";
    }
    
    0 讨论(0)
  • 2021-01-19 19:38

    First, note that == and to a lesser extent < on double is often a poor idea: often you'll have values that logically "should" be equal if the double was infinite precision, but are slightly different.

    However, collecting the frequencies is easy:

    template<typename T, typename Allocator>
    std::unordered_map< T, std::size_t > frequencies( std::vector<T, Allocator> const& src ) {
      std::unordered_map< T, std::size_t > retval;
      for (auto&& x:src)
        ++retval[x];
      return retval;
    }
    

    assuming std::hash<T> is defined (which it is for double). If not, there is more boilerplate, so I'll skip it. Note that this does not care if the vector is sorted.

    If you want it in the form of std::vector<std::size_t> in sync with your sorted vector, you can just do this:

    template<typename T, typename Hash, typename Equality, typename Allocator>
    std::vector<std::size_t> collate_frequencies(
      std::vector<T, Allocator> const& order,
      std::unordered_map<T, std::size_t, Hash, Equality> const& frequencies
    ) {
      std::vector<std::size_t> retval;
      retval.reserve(order.size());
      for( auto&& x : order )
        retval.push_back( frequencies[x] );
      return retval;
    }
    

    I took the liberty of making these functions overly generic, so they support more than just doubles.

    0 讨论(0)
  • 2021-01-19 19:44

    After you sort, before you erase:

    std::vector<int> freq_uv;
    freq_uv.push_back(0);
    auto prev = uv[0];        // you should ensure !uv.empty() if previous code did not already ensure it.
    for (auto const & x : uv)
    {
        if (prev != x)
        {
            freq_uv.push_back(0);
            prev = x;
        }
        ++freq_uv.back();
    }
    

    Note that, while I generally like to count occurences with a map, as Yakk is doing, in this case I think it is doing a lot of unnecessary work as we already know the vector is sorted.

    Another possibility is to use a std::map (not unordered), instead of sorting. This will get your frequencies first. Then, since the map is ordered, you can just create the sorted, unique vector, and the frequency vector directly from the map.

    // uv not yet created
    std::map<T, int> freq_map;
    for (auto const & x : v)
        ++freq_map[x];
    std::vector<T> uv;
    std::vector<int> freq_uv;
    for (auto const & p : freq_map)
    {
        uv.push_back(p.first);
        freq_uv.push_back(p.second);
    }
    
    0 讨论(0)
提交回复
热议问题