问题
I have a vector with doubles which I want to rank (actually it's a vector with objects with a double member called costs
). If there are only unique values or I ignore the nonunique values then there is no problem. However, I want to use the average rank for nonunique values. Furthermore, I have found some question at SO about ranks, however they ignore the non-unique values.
Example, say we have (1, 5, 4, 5, 5) then the corresponding ranks should be (1, 4, 2, 4, 4). When we ignore the non-unique values the ranks are (1, 3, 2, 4, 5).
When ignoring the nonunique values I used the following:
void Population::create_ranks_costs(vector<Solution> &pop)
{
size_t const n = pop.size();
// Create an index vector
vector<size_t> index(n);
iota(begin(index), end(index), 0);
sort(begin(index), end(index),
[&pop] (size_t idx, size_t idy) {
return pop[idx].costs() < pop[idy].costs();
});
// Store the result in the corresponding solutions
for (size_t idx = 0; idx < n; ++idx)
pop[index[idx]].set_rank_costs(idx + 1);
}
Does anyone know how to take the non-unique values into account? I prefer using std::algorithm
since IMO this lead to clean code.
回答1:
One way to do so would be using a multimap.
Place the items in a multimap mapping your objects to
size_t
s (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).Loop (either plainly or using whatever from
algorithm
) and assign 0, 1, ... as the values.Loop over the distinct keys. For each distinct key, call equal_range for the key, and set its values to the average (again, you can use stuff from
algorithm
for this).
The overall complexity should be Theta(n log(n)), where n is the length of the vector.
回答2:
Here is a routine for vectors as the title of the question suggests:
template<typename Vector>
std::vector<double> rank(const Vector& v)
{
std::vector<std::size_t> w(v.size());
std::iota(begin(w), end(w), 0);
std::sort(begin(w), end(w),
[&v](std::size_t i, std::size_t j) { return v[i] < v[j]; });
std::vector<double> r(w.size());
for (std::size_t n, i = 0; i < w.size(); i += n)
{
n = 1;
while (i + n < w.size() && v[w[i]] == v[w[i+n]]) ++n;
for (std::size_t k = 0; k < n; ++k)
{
r[w[i+k]] = i + (n + 1) / 2.0; // average rank of n tied values
// r[w[i+k]] = i + 1; // min
// r[w[i+k]] = i + n; // max
// r[w[i+k]] = i + k + 1; // random order
}
}
return r;
}
A working example see on IDEone.
For ranks with tied (equal) values there are varying conventions (min, max, averaged rank, or random order). Choose one of these in the innermost for loop (averaged rank is common in statistics, min rank in sports).
Please take into account, that averaged ranks can be non-integral (n+0.5
).
I don't know, if rounding down to integral rank n
is a problem for your application.
The algorithm easily could be generalized for user-defined orderings like pop[i].costs()
, with std::less<>
as default.
回答3:
Something along these lines:
size_t run_start = 0;
double run_cost = pop[index[0]].costs();
for (size_t idx = 1; idx <= n; ++idx) {
double new_cost = idx < n ? pop[index[idx]].costs() : 0;
if (idx == n || new_cost != run_cost) {
double avg_rank = (run_start + 1 + idx) / 2.0;
for (size_t j = run_start; j < idx; ++j) {
pop[index[j]].set_rank_costs(avg_rank);
}
run_start = idx;
run_cost = new_cost;
}
}
Basically, you iterate over the sorted sequence and identify runs of equal values (possibly runs of length 1). For each such run, you calculate its average rank, and set it for all elements in the run.
来源:https://stackoverflow.com/questions/30822729/create-ranking-for-vector-of-double