Create ranking for vector of double

情到浓时终转凉″ 提交于 2020-01-05 19:44:11

问题


I have a vector with doubles which I want to rank (actually it's a vector with objects with a double member called costs). If there are only unique values or I ignore the nonunique values then there is no problem. However, I want to use the average rank for nonunique values. Furthermore, I have found some question at SO about ranks, however they ignore the non-unique values.

Example, say we have (1, 5, 4, 5, 5) then the corresponding ranks should be (1, 4, 2, 4, 4). When we ignore the non-unique values the ranks are (1, 3, 2, 4, 5).

When ignoring the nonunique values I used the following:

void Population::create_ranks_costs(vector<Solution> &pop)
{
  size_t const n = pop.size();

  // Create an index vector
  vector<size_t> index(n);
  iota(begin(index), end(index), 0);

  sort(begin(index), end(index), 
       [&pop] (size_t idx, size_t idy) { 
         return pop[idx].costs() < pop[idy].costs();
       });

  // Store the result in the corresponding solutions
  for (size_t idx = 0; idx < n; ++idx)
    pop[index[idx]].set_rank_costs(idx + 1);
}

Does anyone know how to take the non-unique values into account? I prefer using std::algorithm since IMO this lead to clean code.


回答1:


One way to do so would be using a multimap.

  • Place the items in a multimap mapping your objects to size_ts (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).

  • Loop (either plainly or using whatever from algorithm) and assign 0, 1, ... as the values.

  • Loop over the distinct keys. For each distinct key, call equal_range for the key, and set its values to the average (again, you can use stuff from algorithm for this).

The overall complexity should be Theta(n log(n)), where n is the length of the vector.




回答2:


Here is a routine for vectors as the title of the question suggests:

template<typename Vector>
std::vector<double> rank(const Vector& v)
{
    std::vector<std::size_t> w(v.size());
    std::iota(begin(w), end(w), 0);
    std::sort(begin(w), end(w), 
        [&v](std::size_t i, std::size_t j) { return v[i] < v[j]; });

    std::vector<double> r(w.size());
    for (std::size_t n, i = 0; i < w.size(); i += n)
    {
        n = 1;
        while (i + n < w.size() && v[w[i]] == v[w[i+n]]) ++n;
        for (std::size_t k = 0; k < n; ++k)
        {
            r[w[i+k]] = i + (n + 1) / 2.0; // average rank of n tied values
            // r[w[i+k]] = i + 1;          // min 
            // r[w[i+k]] = i + n;          // max
            // r[w[i+k]] = i + k + 1;      // random order
        }
    }
    return r;
}

A working example see on IDEone.

For ranks with tied (equal) values there are varying conventions (min, max, averaged rank, or random order). Choose one of these in the innermost for loop (averaged rank is common in statistics, min rank in sports).

Please take into account, that averaged ranks can be non-integral (n+0.5). I don't know, if rounding down to integral rank n is a problem for your application.

The algorithm easily could be generalized for user-defined orderings like pop[i].costs(), with std::less<> as default.




回答3:


Something along these lines:

size_t run_start = 0;
double run_cost = pop[index[0]].costs();
for (size_t idx = 1; idx <= n; ++idx) {
  double new_cost = idx < n ? pop[index[idx]].costs() : 0;
  if (idx == n || new_cost != run_cost) {
    double avg_rank = (run_start + 1 + idx) / 2.0;
    for (size_t j = run_start; j < idx; ++j) {
       pop[index[j]].set_rank_costs(avg_rank);
    }

    run_start = idx;
    run_cost = new_cost;
  }
}

Basically, you iterate over the sorted sequence and identify runs of equal values (possibly runs of length 1). For each such run, you calculate its average rank, and set it for all elements in the run.



来源:https://stackoverflow.com/questions/30822729/create-ranking-for-vector-of-double

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!