What\'s faster: inserting into a priority queue, or sorting retrospectively?
I am generating some items that I need to be sorted at the end. I was wondering, what is
On a max-insert priority queue operations are O(lg n)
To your first question (which is faster): it depends. Just test it. Assuming you want the final result in a vector, the alternatives might look something like this:
#include <iostream>
#include <vector>
#include <queue>
#include <cstdlib>
#include <functional>
#include <algorithm>
#include <iterator>
#ifndef NUM
#define NUM 10
#endif
int main() {
std::srand(1038749);
std::vector<int> res;
#ifdef USE_VECTOR
for (int i = 0; i < NUM; ++i) {
res.push_back(std::rand());
}
std::sort(res.begin(), res.end(), std::greater<int>());
#else
std::priority_queue<int> q;
for (int i = 0; i < NUM; ++i) {
q.push(std::rand());
}
res.resize(q.size());
for (int i = 0; i < NUM; ++i) {
res[i] = q.top();
q.pop();
}
#endif
#if NUM <= 10
std::copy(res.begin(), res.end(), std::ostream_iterator<int>(std::cout,"\n"));
#endif
}
$ g++ sortspeed.cpp -o sortspeed -DNUM=10000000 && time ./sortspeed
real 0m20.719s
user 0m20.561s
sys 0m0.077s
$ g++ sortspeed.cpp -o sortspeed -DUSE_VECTOR -DNUM=10000000 && time ./sortspeed
real 0m5.828s
user 0m5.733s
sys 0m0.108s
So, std::sort
beats std::priority_queue
, in this case. But maybe you have a better or worse std:sort
, and maybe you have a better or worse implementation of a heap. Or if not better or worse, just more or less suited to your exact usage, which is different from my invented usage: "create a sorted vector containing the values".
I can say with a lot of confidence that random data won't hit the worst case of std::sort
, so in a sense this test might flatter it. But for a good implementation of std::sort
, its worst case will be very difficult to construct, and might not actually be all that bad anyway.
Edit: I added use of a multiset, since some people have suggested a tree:
#elif defined(USE_SET)
std::multiset<int,std::greater<int> > s;
for (int i = 0; i < NUM; ++i) {
s.insert(std::rand());
}
res.resize(s.size());
int j = 0;
for (std::multiset<int>::iterator i = s.begin(); i != s.end(); ++i, ++j) {
res[j] = *i;
}
#else
$ g++ sortspeed.cpp -o sortspeed -DUSE_SET -DNUM=10000000 && time ./sortspeed
real 0m26.656s
user 0m26.530s
sys 0m0.062s
To your second question (complexity): they're all O(n log n), ignoring fiddly implementation details like whether memory allocation is O(1) or not (vector::push_back
and other forms of insert at the end are amortized O(1)) and assuming that by "sort" you mean a comparison sort. Other kinds of sort can have lower complexity.
There are a lot of great answers to this question. A reasonable "rule of thumb" is
For the first case, the best "worst-case" sort is heap sort anyway and you'll often get better cache performance by just focusing on sorting (i.e. instead of interleaving with other operations).
As far as I understand, your problem does not require Priority Queue, since your tasks sounds like "Make many insertions, after that sort everything". That's like shooting birds from a laser, not an appropriate tool. Use standard sorting techniques for that.
You would need a Priority Queue, if your task was to imitate a sequence of operations, where each operation can be either "Add an element to the set" or "Remove smallest/greatest element from the set". This can be used in problem of finding a shortest path on the graph, for example. Here you cannot just use standard sorting techniques.