Confusion about bsr and lzcnt

前端 未结 2 1134
执笔经年
执笔经年 2021-02-20 04:40

I\'m a bit confused about both instructions. First let\'s discard the special case when the scanned value is 0 and the undefined/bsr or bitsize/lzcnt result - this difference is

相关标签:
2条回答
  • 2021-02-20 05:28

    To be clear, there is no working fallback from lzcnt to bsr. What happened is that Intel used the previously redundant sequence rep bsr to encode the new lzcnt instruction. Using a redudant rep prefix for bsr was generally defined to be ignored, but with the caveat that it may decode differently on future CPUs1.

    So if you happen to execute lzcnt on a CPU that doesn't support it, it will execute as bsr. Of course, this fallback is not exactly intentional, and it gives the wrong result (as Paul R points out they look at the same bit but report it differently): it is just a consequence of the way the new instruction was encoded and how pointless rep prefixes were treated by prior CPUs. So the world fallback is pretty much entirely inappropriate for lzcnt and bsr.

    The situation is more subtle for the case of tzcnt and bsf. It uses the same encoding trick: tzcnt has the same encoding as rep bsf, but here the "fallback" mostly works since tzcnt returns the same value as bsf for all inputs except zero. For zero inputs tzcnt returns 32, but bsf leaves the destination undefined.

    You can't really use even this fallback though: if you never have zero inputs you might as well just use bsf, saving a byte and being compatible with a couple decades of CPUs, and if you do have zero inputs the behavior differs.

    So the behavior is perhaps better classified as trivia than a fallback...


    1 Normally this would more or less be esoterica, but you could for example use rep prefixes are where they have no functional effect to lengthen instructions to help align subsequent code without inserting explicit nop instructions. Given the "may decode differently in the future" this would be dangerous when compiling code which may run on any future CPU.

    0 讨论(0)
  • 2021-02-20 05:32

    LZCNT gives the number of leading zero bits. BSR gives the bit index of the most significant 1 bit. So they do effectively the same thing for the non-zero case, except the result is interpreted differently. Therefore you can just subtract the BSR result from 31 to get the same behaviour as with LZCNT, i.e. LZCNT == (31 - BSR).

    0 讨论(0)
提交回复
热议问题