Why can\'t I directly move a byte from memory to a 64-bit register in Intel x86-64 assembly?
For instance, this code:
extern printf
global main
seg
Use move with zero or sign extension as appropriate.
For example: movzx eax, byte [rbp - 1] to zero-extend into RAX.
movsx rax, byte [rbp - 1] to sign-extend into RAX.
You can use the movzx instruction to move a byte to the 64-bit register.
In your case, it would be
movzx r12, byte ptr [rbp - 1]
movzx r13, byte ptr [rbp - 2]
Another way to avoid addressing memory to time would have been
mov ax, word ptr [rbp - 2]
movzx r12, al
movzx r13, ah
but the last instruction would not be compiled. See http://www.felixcloutier.com/x86/MOVZX.html "In 64-bit mode, r/m8 can not be encoded to access the following byte registers if the REX prefix is used: AH, BH, CH, DH."
So we have to make the following:
mov ax, word ptr [rbp - 2]
movzx r12, al
mov al, ah
movzx r13, al
But just two movxz'es like in the first example may be faster (the processor may optimize memory access) - the speed depends on a larger context and should be tested in complex.
You can take benefit of the fact that in 64-bit mode, modifying 32-bit registers also clears highest bits (63-32), but, anyway, you cannot encode the ah
register with movzx instruction under 64-bit even to a 32-bit part of a new register appeared in 64-bit mode (movzx r13d, ah
would not work).
You can use 8-bit, 16-bit, and 32 parts of 64-bit rNN registers the following way:
rNNb - byte rNNw - word rNNd - dword
for example, r10b, r10w, r10d. Here are the examples within the code
xor r8d,dword ptr [r9+r10*4]
.....
xor r8b, al
.....
xor eax, r11d
Please note: The 'h' parts of the rNN registers are not available, they are only available for four first registers: ah, bh, ch and dh.
Another note: when modifying 32-bit parts of 64-bit registers, higher 32 bits are automatically set to zero.
The fastest way of working with the registers is to always clear the highest bits, to remove false dependency on previous content of the registers. This is the way recommended by Intel, and will allow better Out-of-Order Execution (OOE) and Register Renaming (RR). Besides that, working with full registers rather with with their lower parts is faster on modern processors: Knights Landing and Cannonlake. So this is the code that will run faster on these processors (it will use OOE and RR):
movzx rax, word ptr [rbp - 2]
movzx r12, al
shr rax, 8
mov r13, rax
As about Knights Landing and future mainstream processors like CannonLake - Intel is explicit that instructions on 8-bit and 16-bit registers would be much slower than on 32-bit or 64-bit registers on CannonLake and so they are now on Knights Landing.
If you write with OOB and RR in mind, your assembly code will be much faster.