C: Sneaky way to count positive bits?

荒凉一梦 提交于 2020-05-12 15:59:33

问题


So I'm trying to see if there's some sneaky series of bit operations that will allow me to count how many bits in a uint32 are 1 (or rather the count mod 2).

The "obvious" way would be something like this:

uint32 count_1_bits_mod_2(uint32 word) {
    uint32 i, sum_mod_2;
    for(i = 0; i < 32; i++)
        sum_mod_2 ^= word;
        word >>= 1;

Is there some "sneaky" way to get the proper sum_mod_2 without using a loop?


回答1:


The fastest way to count bits is by using "magic numbers":

unsigned int v = 0xCF31; // some number
v = v - ((v >> 1) & 0x55555555);                    // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);     // temp
unsigned int c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count

This prints 9 (link to ideone).

This takes 12 operations for 32-bit numbers - the same number a lookup-based method takes, but you do not need a lookup table.




回答2:


The "best" way might depend upon what CPU architecture your code is running on. For example, Intel/AMD CPUs starting with Nehalem/Barcelona support a "popcnt" instruction which computes the number of 1 bits in an integer register, so in as few as two instructions (popcnt and bitwise AND with 1) you could compute the value you seek.

If you happen to be using a fairly recent version of GCC (or another compiler with similar support), you can use its __builtin_popcount() function to compute the population count, which with the "-mpopcount" or "-msse4.2" compilation flag specified uses the popcnt instruction. See this link for more information. E.g.:

uint32_t parity = __builtin_popcount(x) & 1;



回答3:


The fastest fastest is using CPU popcnt instruction, close second is SSSE3 code. Fastest portable is bitslice method, followed by lookup table: http://www.dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html

As with everything, you should benchmark your load. Then optimise if it's too slow.

For AMD Phenom II X2 550, with gcc 4.7.1 (using g++ -O3 popcnt.cpp -o popcnt -mpopcnt -msse2):

Bitslice(7)            1462142 us; cnt = 32500610
Bitslice(24)           1221985 us; cnt = 32500610
Lauradoux              2347749 us; cnt = 32500610
SSE2 8-bit              790898 us; cnt = 32500610
SSE2 16-bit             825568 us; cnt = 32500610
SSE2 32-bit             864665 us; cnt = 32500610
16-bit LUT             1236739 us; cnt = 32500610
8-bit LUT              1951629 us; cnt = 32500610
gcc popcount            803173 us; cnt = 32500610
gcc popcountll         7678479 us; cnt = 32500610
FreeBSD version 1      2802681 us; cnt = 32500610
FreeBSD version 2      2167031 us; cnt = 32500610
Wikipedia #2           4927947 us; cnt = 32500610
Wikipedia #3           4212143 us; cnt = 32500610
HAKMEM 169/X11         3559245 us; cnt = 32500610
naive                 16182699 us; cnt = 32500610
Wegner/Kernigan       12115119 us; cnt = 32500610
Anderson              61045764 us; cnt = 32500610
8x shift and add       6712049 us; cnt = 32500610
32x shift and add      6662200 us; cnt = 32500610

For Intel Core2 Duo E8400, with gcc 4.7.1 (g++ -O3 popcnt.cpp -o popcnt -mssse3, -mpopcnt is not supported on this CPU)

Bitslice(7)            1353007 us; cnt = 32500610
Bitslice(24)            953044 us; cnt = 32500610
Lauradoux               534697 us; cnt = 32500610
SSE2 8-bit              458277 us; cnt = 32500610
SSE2 16-bit             555278 us; cnt = 32500610
SSE2 32-bit             634897 us; cnt = 32500610
SSSE3                   414542 us; cnt = 32500610
16-bit LUT             1208412 us; cnt = 32500610
8-bit LUT              1400175 us; cnt = 32500610
gcc popcount           5428396 us; cnt = 32500610
gcc popcountll         2743358 us; cnt = 32500610
FreeBSD version 1      3025944 us; cnt = 32500610
FreeBSD version 2      2313264 us; cnt = 32500610
Wikipedia #2           1570519 us; cnt = 32500610
Wikipedia #3           1051828 us; cnt = 32500610
HAKMEM 169/X11         3982779 us; cnt = 32500610
naive                 20951420 us; cnt = 32500610
Wegner/Kernigan       13665630 us; cnt = 32500610
Anderson               6771549 us; cnt = 32500610
8x shift and add      14917323 us; cnt = 32500610
32x shift and add     14494482 us; cnt = 32500610

Bitslice method is a parallel mechanism that counts multiple (7 or 24) machine words at a time so it has marginal usability for a generic function. After http://dalkescientific.com/writings/diary/popcnt.cpp :

static inline int popcount_fbsd2(unsigned *buf, int n)
{
  int cnt=0;
  do {
    unsigned v = *buf++;
    v -= ((v >> 1) & 0x55555555);
    v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
    v = (v + (v >> 4)) & 0x0F0F0F0F;
    v = (v * 0x01010101) >> 24;
    cnt += v;
  } while(--n);
  return cnt;
}

static inline int merging2(const unsigned *data) 
{
        unsigned count1,count2,half1,half2;
        count1=data[0];
        count2=data[1];
        half1=data[2]&0x55555555;
        half2=(data[2]>>1)&0x55555555;
        count1 = count1 - ((count1 >> 1) & 0x55555555);
        count2 = count2 - ((count2 >> 1) & 0x55555555);
        count1+=half1;
        count2+=half2;
        count1 = (count1 & 0x33333333) + ((count1 >> 2) & 0x33333333);
        count2 = (count2 & 0x33333333) + ((count2 >> 2) & 0x33333333);
        count1+=count2;
        count1 = (count1&0x0F0F0F0F)+ ((count1 >> 4) & 0x0F0F0F0F);
        count1 = count1  + (count1 >> 8);
        count1 = count1 + (count1 >> 16);

        return count1 & 0x000000FF;
}

static inline int merging3(const unsigned *data) 
{
        unsigned count1,count2,half1,half2,acc=0;
        int i;

        for(i=0;i<24;i+=3)
        {
                count1=data[i];
                count2=data[i+1];
                //w = data[i+2];
                half1=data[i+2]&0x55555555;
                half2=(data[i+2]>>1)&0x55555555;
                count1 = count1 - ((count1 >> 1) & 0x55555555);
                count2 = count2 - ((count2 >> 1) & 0x55555555);
                count1+=half1;
                count2+=half2;
                count1 = (count1 & 0x33333333) + ((count1 >> 2) & 0x33333333);
                count1 += (count2 & 0x33333333) + ((count2 >> 2) & 0x33333333);
                acc += (count1 & 0x0F0F0F0F)+ ((count1>>4) &0x0F0F0F0F);
        }
        acc = (acc & 0x00FF00FF)+ ((acc>>8)&0x00FF00FF);
        acc = acc + (acc >> 16);
        return acc & 0x00000FFFF;
}

/* count 24 words at a time, then 3 at a time, then 1 at a time */
static inline int popcount_24words(unsigned *buf, int n) {
  int cnt=0, i;

  for (i=0; i<n-n%24; i+=24) {
    cnt += merging3(buf+i);
  }
  for (;i<n-n%3; i+=3) {
    cnt += merging2(buf+i);
  }
  cnt += popcount_fbsd2(buf+i, n-i);
  return cnt;
}



回答4:


count = 0;
while (word != 0) {
  word = word & (word-1);
  count++;
}

The statement

word = word & (word-1);

clears the lowest 1-bit in the word. Eventually, you run out of 1-bits.




回答5:


I think this is fairly easy to understand, and quite efficient.

x = some number;
x ^= (x >> 1); // parity of every bit pair now in bits 0, 2, 4, ...
x ^= (x >> 2); // parity of every 4 bits now in bits 0, 4, 8, ...
x ^= (x >> 4); // ...etc
x ^= (x >> 8);
x ^= (x >> 16); // parity of all 32 bits now in bit 0
parity = x & 1;



回答6:


Precalculate all results and do a simple array lookup. For a simple "count even or odd" boolean result, you can make a bit array.




回答7:


           cnt = 0;
           while (word != 0) {
           word = word & (word-1);
           cnt++;

this could remove 1 bits for more details visit http://pgrtutorials.blogspot.in/p/bit-manipulation.html



来源:https://stackoverflow.com/questions/12055499/c-sneaky-way-to-count-positive-bits

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!