Both should run in O(n log n), but in general sort is faster than stable_sort. How big is the performance gap in practice? Do you have some experience about that?
I want
Was looking for something similar - but was surprised no one talked about Auxiliary space.
As I believe - the implementation of both stable_sort and sort is supposed to guarantee O(NlogN) for all (Best, Average & Worst) cases.
However, the difference exists in the Auxiliary space used. stable_sort needs an Auxiliary space of O(N).
May be the difference in performance lies in acquiring that space. :)
Otherwise, theoretically - they are supposed to be same w.r.t performance.
sort should do what you need unless you need this -> stable_sort preserves the relative order of the elements with equivalent values.
There are good answers that compared the algorithms theoretically. I benchmarked std::sort
and std::stable_sort
with google/benchmark for curiosity's sake.
It is useful to point out ahead of time that;
1 X 2500 MHz CPU
and 1 GB RAM
Arch Linux 2015.08 x86-64
g++ 5.3.0
and clang++ 3.7.0
(-std=c++11
, -O3
and -pthread
)BM_Base*
benchmark tries to measure the time populating std::vector<>
. That time should be subtracted from the sorting results for better comparison. First benchmark sorts std::vector<int>
with 512k
size.
[ g++ ]# benchmark_sorts --benchmark_repetitions=10
Run on (1 X 2500 MHz CPU )
2016-01-08 01:37:43
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------
...
BM_BaseInt/512k_mean 24730499 24726189 28
BM_BaseInt/512k_stddev 293107 310668 0
...
BM_SortInt/512k_mean 70967679 70799990 10
BM_SortInt/512k_stddev 1300811 1301295 0
...
BM_StableSortInt/512k_mean 73487904 73481467 9
BM_StableSortInt/512k_stddev 979966 925172 0
[ clang++ ]# benchmark_sorts --benchmark_repetitions=10
Run on (1 X 2500 MHz CPU )
2016-01-08 01:39:07
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------
...
BM_BaseInt/512k_mean 26198558 26197526 27
BM_BaseInt/512k_stddev 320971 348314 0
...
BM_SortInt/512k_mean 70648019 70666660 10
BM_SortInt/512k_stddev 2030727 2033062 0
...
BM_StableSortInt/512k_mean 82004375 81999989 9
BM_StableSortInt/512k_stddev 197309 181453 0
Second benchmark sorts std::vector<S>
with 512k
size (sizeof(Struct S) = 20
).
[ g++ ]# benchmark_sorts --benchmark_repetitions=10
Run on (1 X 2500 MHz CPU )
2016-01-08 01:49:32
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------
...
BM_BaseStruct/512k_mean 26485063 26410254 26
BM_BaseStruct/512k_stddev 270355 128200 0
...
BM_SortStruct/512k_mean 81844178 81833325 8
BM_SortStruct/512k_stddev 240868 204088 0
...
BM_StableSortStruct/512k_mean 106945879 106857114 7
BM_StableSortStruct/512k_stddev 10446119 10341548 0
[ clang++ ]# benchmark_sorts --benchmark_repetitions=10
Run on (1 X 2500 MHz CPU )
2016-01-08 01:53:01
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------
...
BM_BaseStruct/512k_mean 27327329 27280000 25
BM_BaseStruct/512k_stddev 488318 333059 0
...
BM_SortStruct/512k_mean 78611207 78407400 9
BM_SortStruct/512k_stddev 690207 372230 0
...
BM_StableSortStruct/512k_mean 109477231 109333325 8
BM_StableSortStruct/512k_stddev 11697084 11506626 0
Anyone who likes to run the benchmark, here is the code,
#include <vector>
#include <random>
#include <algorithm>
#include "benchmark/benchmark_api.h"
#define SIZE 1024 << 9
static void BM_BaseInt(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<int> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back(dist(mt));
}
}
}
BENCHMARK(BM_BaseInt)->Arg(SIZE);
static void BM_SortInt(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<int> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back(dist(mt));
}
std::sort(v.begin(), v.end());
}
}
BENCHMARK(BM_SortInt)->Arg(SIZE);
static void BM_StableSortInt(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<int> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back(dist(mt));
}
std::stable_sort(v.begin(), v.end());
}
}
BENCHMARK(BM_StableSortInt)->Arg(SIZE);
struct S {
int key;
int arr[4];
};
static void BM_BaseStruct(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<S> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back({dist(mt)});
}
}
}
BENCHMARK(BM_BaseStruct)->Arg(SIZE);
static void BM_SortStruct(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<S> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back({dist(mt)});
}
std::sort(v.begin(), v.end(),
[](const S &a, const S &b) { return a.key < b.key; });
}
}
BENCHMARK(BM_SortStruct)->Arg(SIZE);
static void BM_StableSortStruct(benchmark::State &state) {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist;
while (state.KeepRunning()) {
std::vector<S> v;
v.reserve(state.range_x());
for (int i = 0; i < state.range_x(); i++) {
v.push_back({dist(mt)});
}
std::stable_sort(v.begin(), v.end(),
[](const S &a, const S &b) { return a.key < b.key; });
}
}
BENCHMARK(BM_StableSortStruct)->Arg(SIZE);
BENCHMARK_MAIN();
Big enough to warrant a separate function that does stable sort and not have std::sort()
do it transparently.
Sometimes std::stable_sort() is needed because it maintains order of elements that are equal.
And the conventional advice is that, if maintaining order is not important, you should use std::sort() instead.
However, its context dependant. There is plenty of data that is best sorted with stable sort even if you don't need to maintain order:
Quicksort quickly becomes worst-case performance if the data has consistently poor pivot points.
The Burrows-Wheeler Transform is an algorithm used as part of data compression, e.g. bzip2. It requires sorting all the rotations of the text. For the majority of text data, merge sort (as often used by std::stable_sort()) is substantially faster than quicksort (as usually used by std::sort()).
bbb is a BWT implementation that notes the advantages of std::stable_sort() over sort() for this application.
std::stable_sort performs NlogN comparisons when sufficient memory is available. When insufficient memory is available, it degrades to N((logN)^2) comparisons. Therefore it is roughly of the same efficiency as std::sort (which performs O(NlogN) comparisons in both average and worst case) when memory is available.
For those interested, sort() uses an introsort (quicksort which switches to heapsort when the recursion reaches a certain depth) and stable_sort() uses a merge sort.
How big is the performance gap in practice? Do you have some experience about that?
Yes, but it didn't go the way you would expect.
I took a C implementation of the Burrows-Wheeler Transform and C++-ified it. Turned out a lot slower than the C code (though the code was cleaner). So I put timing instrumentation in there and it appeared that the qsort was performing faster than the std::sort. This was running in VC6. It was then recompiled with stable_sort and the tests ran faster than the C version. Other optimisations managed to push the C++ version ~25% quicker than the C version. I think it was possible to eke more speed but the clarity of the code was disappearing.