I have a question that could seem very basic, but it is in a context where \"every CPU tick counts\" (this is a part of a larger algorithm that will be used on supercomputers).<
std::sort
has proven to be faster than the old qsort
because of the lack of indirection and the possibility of inlining critical operations.
The implementations of std::sort
are likely to be highly optimized and hard to beat, but not impossible. If your data is fixed length and short you might find Radix sort to be faster. Timsort is relatively new and has delivered good results for Python.
You might keep the index array separate from the value array, but I think the extra level of indirection will prove to be a speed killer. Better to keep them together in a struct or std::pair
.
As always with any speed critical application, you must try some actual implementations and compare them to know for sure which is fastest.