Position of least significant bit that is set

后端 未结 23 991
时光取名叫无心
时光取名叫无心 2020-11-22 08:46

I am looking for an efficient way to determine the position of the least significant bit that is set in an integer, e.g. for 0x0FF0 it would be 4.

A trivial impleme

相关标签:
23条回答
  • 2020-11-22 08:59

    The fastest (non-intrinsic/non-assembler) solution to this is to find the lowest-byte and then use that byte in a 256-entry lookup table. This gives you a worst-case performance of four conditional instructions and a best-case of 1. Not only is this the least amount of instructions, but the least amount of branches which is super-important on modern hardware.

    Your table (256 8-bit entries) should contain the index of the LSB for each number in the range 0-255. You check each byte of your value and find the lowest non-zero byte, then use this value to lookup the real index.

    This does require 256-bytes of memory, but if the speed of this function is so important then that 256-bytes is well worth it,

    E.g.

    byte lowestBitTable[256] = {
    .... // left as an exercise for the reader to generate
    };
    
    unsigned GetLowestBitPos(unsigned value)
    {
      // note that order to check indices will depend whether you are on a big 
      // or little endian machine. This is for little-endian
      byte* bytes = (byte*)value;
      if (bytes[0])
        return lowestBitTable[bytes[0]];
      else if (bytes[1])
          return lowestBitTable[bytes[1]] + 8;
      else if (bytes[2])
          return lowestBitTable[bytes[2]] + 16;
      else
          return lowestBitTable[bytes[3]] + 24;  
    }
    
    0 讨论(0)
  • 2020-11-22 09:01

    Found this clever trick using 'magic masks' in "The art of programming, part 4", which does it in O(log(n)) time for n-bit number. [with log(n) extra space]. Typical solutions checking for the set bit is either O(n) or need O(n) extra space for a look up table, so this is a good compromise.

    Magic masks:

    m0 = (...............01010101)  
    m1 = (...............00110011)
    m2 = (...............00001111)  
    m3 = (.......0000000011111111)
    ....
    

    Key idea: No of trailing zeros in x = 1 * [(x & m0) = 0] + 2 * [(x & m1) = 0] + 4 * [(x & m2) = 0] + ...

    int lastSetBitPos(const uint64_t x) {
        if (x == 0)  return -1;
    
        //For 64 bit number, log2(64)-1, ie; 5 masks needed
        int steps = log2(sizeof(x) * 8); assert(steps == 6);
        //magic masks
        uint64_t m[] = { 0x5555555555555555, //     .... 010101
                         0x3333333333333333, //     .....110011
                         0x0f0f0f0f0f0f0f0f, //     ...00001111
                         0x00ff00ff00ff00ff, //0000000011111111 
                         0x0000ffff0000ffff, 
                         0x00000000ffffffff };
    
        //Firstly extract only the last set bit
        uint64_t y = x & -x;
    
        int trailZeros = 0, i = 0 , factor = 0;
        while (i < steps) {
            factor = ((y & m[i]) == 0 ) ? 1 : 0;
            trailZeros += factor * pow(2,i);
            ++i;
        }
        return (trailZeros+1);
    }
    
    0 讨论(0)
  • 2020-11-22 09:02

    Yet another solution, not the fastest possibly, but seems quite good.
    At least it has no branches. ;)

    uint32 x = ...;  // 0x00000001  0x0405a0c0  0x00602000
    x |= x <<  1;    // 0x00000003  0x0c0fe1c0  0x00e06000
    x |= x <<  2;    // 0x0000000f  0x3c3fe7c0  0x03e1e000
    x |= x <<  4;    // 0x000000ff  0xffffffc0  0x3fffe000
    x |= x <<  8;    // 0x0000ffff  0xffffffc0  0xffffe000
    x |= x << 16;    // 0xffffffff  0xffffffc0  0xffffe000
    
    // now x is filled with '1' from the least significant '1' to bit 31
    
    x = ~x;          // 0x00000000  0x0000003f  0x00001fff
    
    // now we have 1's below the original least significant 1
    // let's count them
    
    x = x & 0x55555555 + (x >>  1) & 0x55555555;
                     // 0x00000000  0x0000002a  0x00001aaa
    
    x = x & 0x33333333 + (x >>  2) & 0x33333333;
                     // 0x00000000  0x00000024  0x00001444
    
    x = x & 0x0f0f0f0f + (x >>  4) & 0x0f0f0f0f;
                     // 0x00000000  0x00000006  0x00000508
    
    x = x & 0x00ff00ff + (x >>  8) & 0x00ff00ff;
                     // 0x00000000  0x00000006  0x0000000d
    
    x = x & 0x0000ffff + (x >> 16) & 0x0000ffff;
                     // 0x00000000  0x00000006  0x0000000d
    // least sign.bit pos. was:  0           6          13
    
    0 讨论(0)
  • 2020-11-22 09:03

    There is an x86 assembly instruction (bsf) that will do it. :)

    More optimized?!

    Side Note:

    Optimization at this level is inherently architecture dependent. Today's processors are too complex (in terms of branch prediction, cache misses, pipelining) that it's so hard to predict which code is executed faster on which architecture. Decreasing operations from 32 to 9 or things like that might even decrease the performance on some architectures. Optimized code on a single architecture might result in worse code in the other. I think you'd either optimize this for a specific CPU or leave it as it is and let the compiler to choose what it thinks it's better.

    0 讨论(0)
  • 2020-11-22 09:03
    unsigned GetLowestBitPos(unsigned value)
    {
        if (value & 1) return 1;
        if (value & 2) return 2;
        if (value & 4) return 3;
        if (value & 8) return 4;
        if (value & 16) return 5;
        if (value & 32) return 6;
        if (value & 64) return 7;
        if (value & 128) return 8;
        if (value & 256) return 9;
        if (value & 512) return 10;
        if (value & 1024) return 11;
        if (value & 2048) return 12;
        if (value & 4096) return 13;
        if (value & 8192) return 14;
        if (value & 16384) return 15;
        if (value & 32768) return 16;
        if (value & 65536) return 17;
        if (value & 131072) return 18;
        if (value & 262144) return 19;
        if (value & 524288) return 20;
        if (value & 1048576) return 21;
        if (value & 2097152) return 22;
        if (value & 4194304) return 23;
        if (value & 8388608) return 24;
        if (value & 16777216) return 25;
        if (value & 33554432) return 26;
        if (value & 67108864) return 27;
        if (value & 134217728) return 28;
        if (value & 268435456) return 29;
        if (value & 536870912) return 30;
        return 31;
    }
    

    50% of all numbers will return on the first line of code.

    75% of all numbers will return on the first 2 lines of code.

    87% of all numbers will return in the first 3 lines of code.

    94% of all numbers will return in the first 4 lines of code.

    97% of all numbers will return in the first 5 lines of code.

    etc.

    I think people that are complaining on how inefficient the worst case scenario for this code don't understand how rare that condition will happen.

    0 讨论(0)
  • 2020-11-22 09:03

    After 11 years we finally have: countr_zero

    Well done C++20

    0 讨论(0)
提交回复
热议问题