I\'m a bit confused about both instructions. First let\'s discard the special case when the scanned value is 0 and the undefined/bsr or bitsize/lzcnt result - this difference is
To be clear, there is no working fallback from lzcnt
to bsr
. What happened is that Intel used the previously redundant sequence rep bsr
to encode the new lzcnt
instruction. Using a redudant rep
prefix for bsr
was generally defined to be ignored, but with the caveat that it may decode differently on future CPUs1.
So if you happen to execute lzcnt
on a CPU that doesn't support it, it will execute as bsr
. Of course, this fallback is not exactly intentional, and it gives the wrong result (as Paul R points out they look at the same bit but report it differently): it is just a consequence of the way the new instruction was encoded and how pointless rep
prefixes were treated by prior CPUs. So the world fallback is pretty much entirely inappropriate for lzcnt
and bsr
.
The situation is more subtle for the case of tzcnt
and bsf
. It uses the same encoding trick: tzcnt
has the same encoding as rep bsf
, but here the "fallback" mostly works since tzcnt
returns the same value as bsf
for all inputs except zero. For zero inputs tzcnt
returns 32, but bsf leaves the destination undefined.
You can't really use even this fallback though: if you never have zero inputs you might as well just use bsf
, saving a byte and being compatible with a couple decades of CPUs, and if you do have zero inputs the behavior differs.
So the behavior is perhaps better classified as trivia than a fallback...
1 Normally this would more or less be esoterica, but you could for example use rep
prefixes are where they have no functional effect to lengthen instructions to help align subsequent code without inserting explicit nop
instructions. Given the "may decode differently in the future" this would be dangerous when compiling code which may run on any future CPU.