问题
I understand that to use std::sort(), the compare function must be strict weak order, otherwise it will crash due to accessing address out of bound. (https://gcc.gnu.org/ml/gcc-bugs/2013-12/msg00333.html)
However, why would std::sort() access out-of-bound address when the compare function is not strict weak order? What is it trying to compare?
Also I wonder if there are other pitfalls in STL that I should be aware of.
回答1:
The first thing is that calling the algorithm with a comparator that does not comply with the requirements is undefined behavior and anything goes...
But other than that, I assume that you are interested in knowing what type of implementation might end up accessing out of bounds if the comparator is bad. Should the implementation not check the bounds before accessing the elements in the first place? i.e. before calling the comparator
The answer is performance, and this is just one of the possible things that could lead to this type of issues. There are different implementations of sorting algorithms, but more often than not, std::sort
is built on top of a variant of quicksort that will degenerate on a different sorting algorithm like mergesort to avoid the quicksort worst case performance.
The implementation of quicksort selects a pivot and then partitions the input around the pivot, then independently sorts both sides. There are different strategies for selection of the pivot, but a common one is the median of three: the algorithm gets the values of the first, last and middle element, selects the median of the three and uses that as the pivot value.
Conceptually partition walks from the left until it finds an element that is not smaller than the pivot, it then walks from the right trying to find an element that is smaller than the pivot. If the two cursors meet, partition completed. If the out of place elements are found, the values are swapped and the process continues in the range determined by both cursors. The loop walking from the left to find the element to swap would look like:
while (pos < end && value(pos) < pivot) { ++pos; }
While in general partition cannot assume that the value of pivot will be in the range, quicksort knows that it is, after all it selected the pivot out of the elements in the range. A common optimization in this case is to swap the value of the median to be in the last element of the loop. This guarantees that value(pos) < pivot
will be true before pos == end
(worst case: pos == end - 1
). The implication here is that we can drop the check for the end of the range and we can use a unchecked_partition
(pick your choice of name) with a simpler faster condition:
while (/*pos < end &&*/ value(pos) < pivot) ++pos;
All perfectly good, except that <
is spelled comparator(value(pos), pivot)
. Now if the comparator
is incorrectly implemented you might end up with comparator(pivot,pivot) == true
and the cursor will run out of bounds.
Note that this is just one example of optimization of the algorithm that can remove bounds check for performance: assuming a valid order, it is impossible to walk out of the array in the above loop if quicksort set the pivot to the last element before calling this modified partition.
Back to the question:
Should the implementation not check the bounds before accessing the elements in the first place? i.e. before calling the comparator
No, not if it removed bounds checking by proving that it won't walk out of the array, but that prove is built on the premise that the comparator is valid.
回答2:
std::sort
does indeed require that the given comparator establishes a strict weak ordering, otherwise the sorting doesn't really make much sense.
As for it accessing out of range, the link you posted is to a bug report, i.e. it isn't supposed to actually do this. Compilers like any other software can and will have bugs. As noted by Adam this particular bug report got rejected since it isn't really a bug.
What exactly happens when you don't have strict weak ordering isn't defined by the standard, it doesn't make sense to do that and is therefore left out by the standard. Therefore it is undefined by omission. Undefined means that anything can happen, even accessing out of range.
As for avoiding "pitfalls" just be aware of the requirements of the algorithms and functions you use. For C++ there is a nice reference site that I usually use: cppreference
Which on the page of std::sort says:
comp - comparison function object (i.e. an object that satisfies the requirements of Compare) which returns true if the first argument is less than (i.e. is ordered before) the second.
With a link to the description of Compare
来源:https://stackoverflow.com/questions/24048022/what-causes-stdsort-to-access-address-out-of-range