I have a question that could seem very basic, but it is in a context where \"every CPU tick counts\" (this is a part of a larger algorithm that will be used on supercomputers).<
It might be worth separating numbers and indexes and then just sorting indexes, like this:
#include
#include
#include
void PrintElements(const std::vector& numbers, const std::vector& indexes) {
std::cout << "\tNumbers:";
for (auto i = indexes.begin(); i != indexes.end(); ++i)
std::cout << '\t' << numbers[*i];
std::cout << std::endl;
std::cout << "\tIndexes:";
for (auto i = indexes.begin(); i != indexes.end(); ++i)
std::cout << '\t' << *i;
std::cout << std::endl;
}
int main() {
std::vector numbers;
std::vector indexes;
numbers.reserve(4); // An overkill for this few elements, but important for billions.
numbers.push_back(32);
numbers.push_back(91);
numbers.push_back(11);
numbers.push_back(72);
indexes.reserve(numbers.capacity());
indexes.push_back(0);
indexes.push_back(1);
indexes.push_back(2);
indexes.push_back(3);
std::cout << "BEFORE:" << std::endl;
PrintElements(numbers, indexes);
std::sort(
indexes.begin(),
indexes.end(),
[&numbers](size_t i1, size_t i2) {
return numbers[i1] < numbers[i2];
}
);
std::cout << "AFTER:" << std::endl;
PrintElements(numbers, indexes);
return EXIT_SUCCESS;
}
This prints:
BEFORE:
Numbers: 32 91 11 72
Indexes: 0 1 2 3
AFTER:
Numbers: 11 32 72 91
Indexes: 2 0 3 1
The idea is that the elements being sorted are small and thus fast to move around during the sort. On modern CPUs however, the effects of indirect access to numbers
on caching could spoil these gains, so I recommend benchmarking on realistic amounts of data before making a final decision to use it.