Implementation of __builtin_clz

前端 未结 2 1558
抹茶落季
抹茶落季 2020-12-05 04:56

What is the implementation of GCC\'s (4.6+) __builtin_clz? Does it correspond to some CPU instruction on Intel x86_64 (AVX)?

相关标签:
2条回答
  • 2020-12-05 05:14

    It should translate to a Bit Scan Reverse instruction and a subtract. The BSR gives the index of the leading 1, and then you can subtract that from the word size to get the number of leading zeros.

    Edit: if your CPU supports LZCNT (Leading Zero Count), then that will probably do the trick too, but not all x86-64 chips have that instruction.

    0 讨论(0)
  • 2020-12-05 05:18

    Yes, and no.

    CLZ (count leading zero) and BSR (bit-scan reverse) are related but different. CLZ equals (type bit width less one) - BSR. CTZ (count trailing zero), also know as FFS (find first set) is the same as BSF (bit-scan forward.)

    Note that all of these are undefined when operating on zero!

    In answer to your question, most of the time on x86 and x86_64, __builtin_clz generates BSR operation subtracted from 31 (or whatever your type width is), and __builting_ctz generates a BSF operation.

    If you want to know what assembler GCC is generating, the best way to know is to see. The -S flag will have gcc output the assembler it generated for the given input:

    gcc -S -o test.S test.c

    Consider:

    unsigned int clz(unsigned int num) {
        return __builtin_clz(num);
    }
    
    unsigned int ctz(unsigned int num) {
        return __builtin_ctz(num);
    }
    

    On x86 for clz gcc (-O2) generates:

    bsrl    %edi, %eax
    xorl    $31, %eax
    ret
    

    and for ctz:

    bsfl    %edi, %eax
    ret
    

    Note that if you really want bsr, and not clz, you need to do 31 - clz (for 32-bit integers.) This explains the XOR 31, as x XOR 31 == 31 - x (this identity is only true for numbers of the from 2^y - 1) So:

    num = __builtin_clz(num) ^ 31;
    

    yields

    bsrl    %edi, %eax
    ret
    
    0 讨论(0)
提交回复
热议问题