问题
So I'm trying to see if there's some sneaky series of bit operations that will allow me to count how many bits in a uint32 are 1 (or rather the count mod 2).
The "obvious" way would be something like this:
uint32 count_1_bits_mod_2(uint32 word) {
uint32 i, sum_mod_2;
for(i = 0; i < 32; i++)
sum_mod_2 ^= word;
word >>= 1;
Is there some "sneaky" way to get the proper sum_mod_2 without using a loop?
回答1:
The fastest way to count bits is by using "magic numbers":
unsigned int v = 0xCF31; // some number
v = v - ((v >> 1) & 0x55555555); // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
unsigned int c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
This prints 9 (link to ideone).
This takes 12 operations for 32-bit numbers - the same number a lookup-based method takes, but you do not need a lookup table.
回答2:
The "best" way might depend upon what CPU architecture your code is running on. For example, Intel/AMD CPUs starting with Nehalem/Barcelona support a "popcnt" instruction which computes the number of 1 bits in an integer register, so in as few as two instructions (popcnt and bitwise AND with 1) you could compute the value you seek.
If you happen to be using a fairly recent version of GCC (or another compiler with similar support), you can use its __builtin_popcount() function to compute the population count, which with the "-mpopcount" or "-msse4.2" compilation flag specified uses the popcnt instruction. See this link for more information. E.g.:
uint32_t parity = __builtin_popcount(x) & 1;
回答3:
The fastest fastest is using CPU popcnt instruction, close second is SSSE3 code. Fastest portable is bitslice method, followed by lookup table: http://www.dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html
As with everything, you should benchmark your load. Then optimise if it's too slow.
For AMD Phenom II X2 550, with gcc 4.7.1 (using g++ -O3 popcnt.cpp -o popcnt -mpopcnt -msse2
):
Bitslice(7) 1462142 us; cnt = 32500610 Bitslice(24) 1221985 us; cnt = 32500610 Lauradoux 2347749 us; cnt = 32500610 SSE2 8-bit 790898 us; cnt = 32500610 SSE2 16-bit 825568 us; cnt = 32500610 SSE2 32-bit 864665 us; cnt = 32500610 16-bit LUT 1236739 us; cnt = 32500610 8-bit LUT 1951629 us; cnt = 32500610 gcc popcount 803173 us; cnt = 32500610 gcc popcountll 7678479 us; cnt = 32500610 FreeBSD version 1 2802681 us; cnt = 32500610 FreeBSD version 2 2167031 us; cnt = 32500610 Wikipedia #2 4927947 us; cnt = 32500610 Wikipedia #3 4212143 us; cnt = 32500610 HAKMEM 169/X11 3559245 us; cnt = 32500610 naive 16182699 us; cnt = 32500610 Wegner/Kernigan 12115119 us; cnt = 32500610 Anderson 61045764 us; cnt = 32500610 8x shift and add 6712049 us; cnt = 32500610 32x shift and add 6662200 us; cnt = 32500610
For Intel Core2 Duo E8400, with gcc 4.7.1 (g++ -O3 popcnt.cpp -o popcnt -mssse3
, -mpopcnt
is not supported on this CPU)
Bitslice(7) 1353007 us; cnt = 32500610 Bitslice(24) 953044 us; cnt = 32500610 Lauradoux 534697 us; cnt = 32500610 SSE2 8-bit 458277 us; cnt = 32500610 SSE2 16-bit 555278 us; cnt = 32500610 SSE2 32-bit 634897 us; cnt = 32500610 SSSE3 414542 us; cnt = 32500610 16-bit LUT 1208412 us; cnt = 32500610 8-bit LUT 1400175 us; cnt = 32500610 gcc popcount 5428396 us; cnt = 32500610 gcc popcountll 2743358 us; cnt = 32500610 FreeBSD version 1 3025944 us; cnt = 32500610 FreeBSD version 2 2313264 us; cnt = 32500610 Wikipedia #2 1570519 us; cnt = 32500610 Wikipedia #3 1051828 us; cnt = 32500610 HAKMEM 169/X11 3982779 us; cnt = 32500610 naive 20951420 us; cnt = 32500610 Wegner/Kernigan 13665630 us; cnt = 32500610 Anderson 6771549 us; cnt = 32500610 8x shift and add 14917323 us; cnt = 32500610 32x shift and add 14494482 us; cnt = 32500610
Bitslice method is a parallel mechanism that counts multiple (7 or 24) machine words at a time so it has marginal usability for a generic function. After http://dalkescientific.com/writings/diary/popcnt.cpp :
static inline int popcount_fbsd2(unsigned *buf, int n)
{
int cnt=0;
do {
unsigned v = *buf++;
v -= ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = (v + (v >> 4)) & 0x0F0F0F0F;
v = (v * 0x01010101) >> 24;
cnt += v;
} while(--n);
return cnt;
}
static inline int merging2(const unsigned *data)
{
unsigned count1,count2,half1,half2;
count1=data[0];
count2=data[1];
half1=data[2]&0x55555555;
half2=(data[2]>>1)&0x55555555;
count1 = count1 - ((count1 >> 1) & 0x55555555);
count2 = count2 - ((count2 >> 1) & 0x55555555);
count1+=half1;
count2+=half2;
count1 = (count1 & 0x33333333) + ((count1 >> 2) & 0x33333333);
count2 = (count2 & 0x33333333) + ((count2 >> 2) & 0x33333333);
count1+=count2;
count1 = (count1&0x0F0F0F0F)+ ((count1 >> 4) & 0x0F0F0F0F);
count1 = count1 + (count1 >> 8);
count1 = count1 + (count1 >> 16);
return count1 & 0x000000FF;
}
static inline int merging3(const unsigned *data)
{
unsigned count1,count2,half1,half2,acc=0;
int i;
for(i=0;i<24;i+=3)
{
count1=data[i];
count2=data[i+1];
//w = data[i+2];
half1=data[i+2]&0x55555555;
half2=(data[i+2]>>1)&0x55555555;
count1 = count1 - ((count1 >> 1) & 0x55555555);
count2 = count2 - ((count2 >> 1) & 0x55555555);
count1+=half1;
count2+=half2;
count1 = (count1 & 0x33333333) + ((count1 >> 2) & 0x33333333);
count1 += (count2 & 0x33333333) + ((count2 >> 2) & 0x33333333);
acc += (count1 & 0x0F0F0F0F)+ ((count1>>4) &0x0F0F0F0F);
}
acc = (acc & 0x00FF00FF)+ ((acc>>8)&0x00FF00FF);
acc = acc + (acc >> 16);
return acc & 0x00000FFFF;
}
/* count 24 words at a time, then 3 at a time, then 1 at a time */
static inline int popcount_24words(unsigned *buf, int n) {
int cnt=0, i;
for (i=0; i<n-n%24; i+=24) {
cnt += merging3(buf+i);
}
for (;i<n-n%3; i+=3) {
cnt += merging2(buf+i);
}
cnt += popcount_fbsd2(buf+i, n-i);
return cnt;
}
回答4:
count = 0;
while (word != 0) {
word = word & (word-1);
count++;
}
The statement
word = word & (word-1);
clears the lowest 1-bit in the word. Eventually, you run out of 1-bits.
回答5:
I think this is fairly easy to understand, and quite efficient.
x = some number;
x ^= (x >> 1); // parity of every bit pair now in bits 0, 2, 4, ...
x ^= (x >> 2); // parity of every 4 bits now in bits 0, 4, 8, ...
x ^= (x >> 4); // ...etc
x ^= (x >> 8);
x ^= (x >> 16); // parity of all 32 bits now in bit 0
parity = x & 1;
回答6:
Precalculate all results and do a simple array lookup. For a simple "count even or odd" boolean result, you can make a bit array.
回答7:
cnt = 0;
while (word != 0) {
word = word & (word-1);
cnt++;
this could remove 1 bits for more details visit http://pgrtutorials.blogspot.in/p/bit-manipulation.html
来源:https://stackoverflow.com/questions/12055499/c-sneaky-way-to-count-positive-bits