问题
I notice that vector is much slower than bool array when running the following code.
int main()
{
int count = 0;
int n = 1500000;
// slower with c++ vector<bool>
/*vector<bool> isPrime;
isPrime.reserve(n);
isPrime.assign(n, true);
*/
// faster with bool array
bool* isPrime = new bool[n];
for (int i = 0; i < n; ++i)
isPrime[i] = true;
for (int i = 2; i< n; ++i) {
if (isPrime[i])
count++;
for (int j =2; i*j < n; ++j )
isPrime[i*j] = false;
}
cout << count << endl;
return 0;
}
Is there some way that I can do to make vector<bool>
faster ? Btw, both std::vector::push_back
and std::vector::emplace_back
are even slower than std::vector::assign
.
回答1:
std::vector<bool>
can have various performance issues (e.g. take a look at https://isocpp.org/blog/2012/11/on-vectorbool).
In general you can:
use
std::vector<std::uint8_t>
instead ofstd::vector<bool>
(give a try to std::valarray<bool> also).This requires more memory and is less cache-friendly but there isn't a overhead (in the form of bit manipulation) to access a single value, so there are situations in which it works better (after all it's just like your array of
bool
but without the nuisance of memory management)- use std::bitset if you know at compile time how large your boolean array is going to be (or if you can at least establish a reasonable upper bound)
- if Boost is an option try boost::dynamic_bitset (the size can be specified at runtime)
But for speed optimizations you have to test...
With your specific example I can confirm a performance difference only when optimizations are turned off (of course this isn't the way to go).
Some tests with g++ v4.8.3 and clang++ v3.4.5 on an Intel Xeon system (-O3
optimization level) give a different picture:
time (ms)
G++ CLANG++
array of bool 3103 3010
vector<bool> 2835 2420 // not bad!
vector<char> 3136 3031 // same as array of bool
bitset 2742 2388 // marginally better
(time elapsed for 100 runs of the code in the answer)
std::vector<bool>
doesn't look so bad (source code here).
回答2:
vector<bool>
may have a template specialization and may be implemented using bit array to save space. Extracting and saving a bit and converting it from / to bool
may cause the performance drop you are observing. If you use std::vector::push_back
, you are resizing the vector which will cause even worse performance. Next performance killer may be assign
(Worst complexity: Linear of first argument), instead use operator []
(Complexity: constant).
On the other hand, bool []
is guaranteed to be array of bool
.
And you should resize to n
instead of n-1
to avoid undefined behaviour.
回答3:
vector<bool>
can be high performance, but isn't required to be. For vector<bool>
to be efficient, it needs to operate on many bools at a time (e.g. isPrime.assign(n, true)
), and the implementor has had to put loving care into it. Indexing individual bools in a vector<bool>
is slow.
Here is a prime finder that I wrote a while back using vector<bool>
and clang + libc++ (the libc++ part is important):
#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>
std::vector<bool>
init_primes()
{
std::vector<bool> primes(0x80000000, true);
primes[0] = false;
primes[1] = false;
const auto pb = primes.begin();
const auto pe = primes.end();
const auto sz = primes.size();
size_t i = 2;
while (true)
{
size_t j = i*i;
if (j >= sz)
break;
do
{
primes[j] = false;
j += i;
} while (j < sz);
i = std::find(pb + (i+1), pe, true) - pb;
}
return primes;
}
int
main()
{
using namespace std::chrono;
using dsec = duration<double>;
auto t0 = steady_clock::now();
auto p = init_primes();
auto t1 = steady_clock::now();
std::cout << dsec(t1-t0).count() << "\n";
}
This executes for me in about 28s (-O3). When I change it to return a vector<char>
instead, the execution time goes up to about 44s.
If you run this using some other std::lib, you probably won't see this trend. On libc++ algorithms such as std::find
have been optimized to search a word of bits at a time, instead of bit at a time.
See http://howardhinnant.github.io/onvectorbool.html for more details on what std algorithms could be optimized by your vendor.
来源:https://stackoverflow.com/questions/36932684/c11-vectorbool-performance-issue-with-code-example