Counting the number of leading zeros in a 128-bit integer

后端 未结 3 831
日久生厌
日久生厌 2021-01-06 03:40

How can I count the number of leading zeros in a 128-bit integer (uint128_t) efficiently?

I know GCC\'s built-in functions:

  • __builti
相关标签:
3条回答
  • 2021-01-06 04:24

    Assuming a 'random' distribution, the first non-zero bit will be in the high 64 bits, with an overwhelming probability, so it makes sense to test that half first.

    Have a look at the code generated for:

    /* inline */ int clz_u128 (uint128_t u)
    {
        unsigned long long hi, lo; /* (or uint64_t) */
        int b = 128;
    
        if ((hi = u >> 64) != 0) {
            b = __builtin_clzll(hi);
        }
        else if ((lo = u & ~0ULL) != 0) {
            b = __builtin_clzll(lo) + 64;
        }
    
        return b;
    }
    

    I would expect gcc to implement each __builtin_clzll using the bsrq instruction - bit scan reverse, i.e., most-significant bit position - in conjunction with an xor, (msb ^ 63), or sub, (63 - msb), to turn it into a leading zero count. gcc might generate lzcnt instructions with the right -march= (architecture) options.


    Edit: others have pointed out that the 'distribution' is not relevant in this case, since the HI uint64_t needs to be tested regardless.

    0 讨论(0)
  • 2021-01-06 04:41
    inline int clz_u128 (uint128_t u) {
      uint64_t hi = u>>64;
      uint64_t lo = u;
      int retval[3]={
        __builtin_clzll(hi),
        __builtin_clzll(lo)+64,
        128
      };
      int idx = !hi + ((!lo)&(!hi));
      return retval[idx];
    }
    

    this is a branch free variant. Note that more work is done than in the branchy solution, and in practice the branching will probably be predictable.

    It also relies on __builtin_clzll not crashing when fed 0: the docs say the result is undefined, but is it just unspecified or undefined?

    0 讨论(0)
  • 2021-01-06 04:41

    Yakk's answer works well for all kinds of targets as long as gcc supports 128 bit integers for the target. However, note that on the x86-64 platform, with an Intel Haswell processor or newer, there is a more efficient solution:

    #include <immintrin.h>
    #include <stdint.h>
    // tested with compiler options: gcc -O3 -Wall -m64  -mlzcnt
    
    inline int lzcnt_u128 (unsigned __int128 u) {
      uint64_t hi = u>>64;
      uint64_t lo = u;
      lo = (hi == 0) ? lo : -1ULL;
      return _lzcnt_u64(hi) + _lzcnt_u64(lo);
    }
    

    The _lzcnt_u64 intrinsic compiles (gcc 5.4) to the lzcnt instruction, which is well defined for a zero input (it returns 64), in contrary to gcc's __builtin_clzll(). The ternary operator compiles to the cmove instruction.

    0 讨论(0)
提交回复
热议问题