How to find the position of the only-set-bit in a 64-bit value using bit manipulation efficiently?

后端 未结 9 1641
有刺的猬
有刺的猬 2020-12-24 05:51

Just say I have a value of type uint64_t seen as sequence of octets (1 octet = 8-bit). The uint64_t value is known containing only one set bit<

相关标签:
9条回答
  • 2020-12-24 06:06

    Multiply the value by a carefully designed 64-bit constant, then mask off the upper 4 bits. For any CPU with fast 64-bit multiplication, this is probably as optimal as you can get.

    int field_set(uint64_t input) {
        uint64_t field = input * 0x20406080a0c0e1ULL;
        return (field >> 60) & 15;
    }
    
    // field_set(0x0000000000000000ULL) = 0
    // field_set(0x0000000000000080ULL) = 1
    // field_set(0x0000000000008000ULL) = 2
    // field_set(0x0000000000800000ULL) = 3
    // field_set(0x0000000080000000ULL) = 4
    // field_set(0x0000008000000000ULL) = 5
    // field_set(0x0000800000000000ULL) = 6
    // field_set(0x0080000000000000ULL) = 7
    // field_set(0x8000000000000000ULL) = 8
    

    clang implements this in three x86_64 instructions, not counting the frame setup and cleanup:

    _field_set:
        push   %rbp
        mov    %rsp,%rbp
        movabs $0x20406080a0c0e1,%rax
        imul   %rdi,%rax
        shr    $0x3c,%rax
        pop    %rbp
        retq
    

    Note that the results for any other input will be pretty much random. (So don't do that.)

    I don't think there's any feasible way to extend this method to return values in the 7..63 range directly (the structure of the constant doesn't permit it), but you can convert the results to that range by multiplying the result by 7.


    With regard to how this constant was designed: I started with the following observations:

    • Unsigned multiplication is a fast operation on most CPUs, and can have useful effects. We should use it. :)
    • Multiplying anything by zero results in zero. Since this matches with the desired result for a no-bits-set input, we're doing well so far.
    • Multiplying anything by 1ULL<<63 (i.e, your "pos=63" value) can only possibly result in the same value, or zero. (It cannot possibly have any lower bits set, and there are no higher bits to change.) Therefore, we must find some way for this value to be treated as the correct result.
    • A convenient way of making this value be its own correct result is by right-shifting it by 60 bits. This shifts it down to "8", which is a convenient enough representation. We can proceed to encode the other outputs as 1 through 7.
    • Multiplying our constant by each of the other bit fields is equivalent to left-shifting it by a number of bits equal to its "position". The right-shift by 60 bits causes only the 4 bits to the left of a given position to appear in the result. Thus, we can create all of the cases except for one as follows:

       uint64_t constant = (
            1ULL << (60 - 7)
          | 2ULL << (60 - 15)
          | 3ULL << (60 - 23)
          | 4ULL << (60 - 31)
          | 5ULL << (60 - 39)
          | 6ULL << (60 - 47)
          | 7ULL << (60 - 55)
       );
      

    So far, the constant is 0x20406080a0c0e0ULL. However, this doesn't give the right result for pos=63; this constant is even, so multiplying it by that input gives zero. We must set the lowest bit (i.e, constant |= 1ULL) to get that case to work, giving us the final value of 0x20406080a0c0e1ULL.

    Note that the construction above can be modified to encode the results differently. However, the output of 8 is fixed as described above, and all other output must fit into 4 bits (i.e, 0 to 15).

    0 讨论(0)
  • 2020-12-24 06:07
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 10000000  pos = 7
    

    ..., but returns 0 if there is no bit that is set.

    This will return the same if the first bit or no bit is set; however, on x86_64, that is exactly what bsrq does:

    int bsrq_x86_64(uint64_t x){
      int ret;
      asm("bsrq %0, %1":"=r"(ret):"r"(x));
      return ret;
    }
    

    However; if the first bit is set it will also return 0; here is a method that will run in constant time (no looping or branching) and returns -1 when no bits are set (to distinguish from when the first bit is set).

    int find_bit(unsigned long long x){
      int ret=0,
      cmp = (x>(1LL<<31))<<5; //32 if true else 0
      ret += cmp;
      x  >>= cmp;
      cmp = (x>(1<<15))<<4; //16 if true else 0
      ret += cmp;
      x  >>= cmp;
      cmp = (x>(1<<7))<<3; //8
      ret += cmp;
      x  >>= cmp;
      cmp = (x>(1<<3))<<2; //4
      ret += cmp;
      x  >>= cmp;
      cmp = (x>(1<<1))<<1; //2
      ret += cmp;
      x  >>= cmp;
      cmp = (x>1);
      ret += cmp;
      x  >>= cmp;
      ret += x;
      return ret-1;
    }
    

    Technically this just returns the position of the most significant set bit. Depending on the type of float used, this can be done in fewer operations using the fast inverse square or other bit twiddling hacks

    BTW,If don't mind using compiler builtins, you can just do:

    __builtin_popcountll(n-1) or __builtin_ctzll(n) or __builtin_ffsll(n)-1

    0 讨论(0)
  • 2020-12-24 06:12

    C++ tag was removed, but here is a portable C++ answer nonetheless since you can compile it with C++ and use an extern C interface:

    If you have a power of 2 and you subtract one you end up with a binary number with the number of set bits equal to the position

    A way to count the number of set bits (binary 1s) is wrapped, presumably most efficiently by each implementation of the stl, in std::bitset member function count

    Note that your specification has 0 returned for both 0 or 1, so I added as_specified_pos to meet this requirement. Personally I would just leave it return the natural value of 64 when passed 0 to be able to differentiate, and for the speed.

    The following code should be extremely portable and most likely optimized per platform by compiler vendors:

    #include <bitset>
    
    uint64_t pos(uint64_t val)
    {
       return std::bitset<64>(val-1).count();
    }
    
    uint64_t as_specified_pos(uint64_t val)
    {
        return (val) ? pos(val) : 0;
    }
    

    On Linux with g++ I get the following disassembled code:

    0000000000000000 <pos(unsigned long)>:
       0:   48 8d 47 ff             lea    -0x1(%rdi),%rax
       4:   f3 48 0f b8 c0          popcnt %rax,%rax
       9:   c3                      retq
       a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    
    0000000000000010 <as_specified_pos(unsigned long)>:
      10:   31 c0                   xor    %eax,%eax
      12:   48 85 ff                test   %rdi,%rdi
      15:   74 09                   je     20 <as_specified_pos(unsigned long)+0x10>
      17:   48 8d 47 ff             lea    -0x1(%rdi),%rax
      1b:   f3 48 0f b8 c0          popcnt %rax,%rax
      20:   f3 c3                   repz retq
    
    0 讨论(0)
  • 2020-12-24 06:18

    If you want an algorithm for the job rather than a built-in, this will do it. It yields the bit number of the most significant 1 bit, even if more than one bit is set. It narrows down the position by iteratively dividing the bit range under consideration into halves, testing whether there are any bits set in the upper half, taking that half as the new bit range if so, and otherwise taking the lower half as the new bit range.

    #define TRY_WINDOW(bits, n, msb) do { \
        uint64_t t = n >> bits;           \
        if (t) {                          \
            msb += bits;                  \
            n = t;                        \
        }                                 \
    } while (0)
    
    int msb(uint64_t n) {
        int msb = 0;
    
        TRY_WINDOW(32, n, msb);
        TRY_WINDOW(16, n, msb);
        TRY_WINDOW( 8, n, msb);
        TRY_WINDOW( 4, n, msb);
        TRY_WINDOW( 2, n, msb);
        TRY_WINDOW( 1, n, msb);
    
        return msb;
    }
    
    0 讨论(0)
  • 2020-12-24 06:21

    The value mod 0x8C yields a unique value for each of the cases.

    This value mod 0x11 is still unique.

    The second value in the table is the resulting mod 0x11.

    128 9
    32768   5
    8388608 10
    2147483648  0
    549755813888    14
    140737488355328 2
    36028797018963968   4
    9223372036854775808     15
    

    So a simple lookup table will suffice.

    int find_bit(uint64_t bit){ 
      int lookup[] = { the seventeen values };
      return lookup[ (bit % 0x8C) % 0x11];
    }
    

    No branching, no compiler tricks.

    For completeness, the array is

    { 31, 0, 47, 15, 55, 0, 0, 7, 23, 0, 0, 0, 39, 63, 0, 0}
    
    0 讨论(0)
  • 2020-12-24 06:23

    If you can use POSIX, use the ffs() function from strings.h (not string.h!). It returns the position of the least significant bit set (one indexed) or a zero if the argument is zero. On most implementations, a call to ffs() is inlined and compiled into the corresponding machine instruction, like bsf on x86. The glibc also has ffsll() for long long arguments which should be even more suitable for your problem if available.

    0 讨论(0)
提交回复
热议问题