I have a question that could seem very basic, but it is in a context where \"every CPU tick counts\" (this is a part of a larger algorithm that will be used on supercomputers).<
In that case you may want to look into parallel sorting algorithms. That will only make sense for sorting large data sets, but the win if you need it is substantial.