Today I needed a simple algorithm for checking if a number is a power of 2.
The algorithm needs to be:
ulong
In C, I tested the i && !(i & (i - 1)
trick and compared it with __builtin_popcount(i)
, using gcc on Linux, with the -mpopcnt flag to be sure to use the CPU's POPCNT instruction. My test program counted the # of integers between 0 and 2^31 that were a power of two.
At first I thought that i && !(i & (i - 1)
was 10% faster, even though I verified that POPCNT was used in the disassembly where I used__builtin_popcount
.
However, I realized that I had included an if statement, and branch prediction was probably doing better on the bit twiddling version. I removed the if and POPCNT ended up faster, as expected.
Results:
Intel(R) Core(TM) i7-4771 CPU max 3.90GHz
Timing (i & !(i & (i - 1))) trick
30
real 0m13.804s
user 0m13.799s
sys 0m0.000s
Timing POPCNT
30
real 0m11.916s
user 0m11.916s
sys 0m0.000s
AMD Ryzen Threadripper 2950X 16-Core Processor max 3.50GHz
Timing (i && !(i & (i - 1))) trick
30
real 0m13.675s
user 0m13.673s
sys 0m0.000s
Timing POPCNT
30
real 0m13.156s
user 0m13.153s
sys 0m0.000s
Note that here the Intel CPU seems slightly slower than AMD with the bit twiddling, but has a much faster POPCNT; the AMD POPCNT doesn't provide as much of a boost.
popcnt_test.c:
#include "stdio.h"
// Count # of integers that are powers of 2 up to 2^31;
int main() {
int n;
for (int z = 0; z < 20; z++){
n = 0;
for (unsigned long i = 0; i < 1<<30; i++) {
#ifdef USE_POPCNT
n += (__builtin_popcount(i)==1); // Was: if (__builtin_popcount(i) == 1) n++;
#else
n += (i && !(i & (i - 1))); // Was: if (i && !(i & (i - 1))) n++;
#endif
}
}
printf("%d\n", n);
return 0;
}
Run tests:
gcc popcnt_test.c -O3 -o test.exe
gcc popcnt_test.c -O3 -DUSE_POPCNT -mpopcnt -o test-popcnt.exe
echo "Timing (i && !(i & (i - 1))) trick"
time ./test.exe
echo
echo "Timing POPCNT"
time ./test-opt.exe