Why can't I move directly a byte to a 64 bit register?

后端 未结 2 1384
生来不讨喜
生来不讨喜 2020-12-11 05:29

Why can\'t I directly move a byte from memory to a 64-bit register in Intel x86-64 assembly?

For instance, this code:

extern printf

global main

seg         


        
相关标签:
2条回答
  • 2020-12-11 05:41

    Use move with zero or sign extension as appropriate.

    For example: movzx eax, byte [rbp - 1] to zero-extend into RAX.

    movsx rax, byte [rbp - 1] to sign-extend into RAX.

    0 讨论(0)
  • 2020-12-11 05:49

    Expanding 8-bit registers to 64-bit when assigning values

    You can use the movzx instruction to move a byte to the 64-bit register.

    In your case, it would be

    movzx     r12, byte ptr [rbp - 1]
    movzx     r13, byte ptr [rbp - 2]
    

    Another way to avoid addressing memory to time would have been

    mov       ax,  word ptr [rbp - 2]
    movzx     r12, al
    movzx     r13, ah
    

    but the last instruction would not be compiled. See http://www.felixcloutier.com/x86/MOVZX.html "In 64-bit mode, r/m8 can not be encoded to access the following byte registers if the REX prefix is used: AH, BH, CH, DH."

    So we have to make the following:

    mov       ax,  word ptr [rbp - 2]
    movzx     r12, al
    mov       al, ah
    movzx     r13, al
    

    But just two movxz'es like in the first example may be faster (the processor may optimize memory access) - the speed depends on a larger context and should be tested in complex.

    You can take benefit of the fact that in 64-bit mode, modifying 32-bit registers also clears highest bits (63-32), but, anyway, you cannot encode the ah register with movzx instruction under 64-bit even to a 32-bit part of a new register appeared in 64-bit mode (movzx r13d, ah would not work).

    Using 8-bit, 16-bit, and 32 parts of 64-bit rNN registers

    You can use 8-bit, 16-bit, and 32 parts of 64-bit rNN registers the following way:

    rNNb - byte rNNw - word rNNd - dword

    for example, r10b, r10w, r10d. Here are the examples within the code

        xor     r8d,dword ptr [r9+r10*4]
        .....
        xor     r8b, al
        .....
        xor     eax, r11d
    

    Please note: The 'h' parts of the rNN registers are not available, they are only available for four first registers: ah, bh, ch and dh.

    Another note: when modifying 32-bit parts of 64-bit registers, higher 32 bits are automatically set to zero.

    The fastest way of working with the registers

    The fastest way of working with the registers is to always clear the highest bits, to remove false dependency on previous content of the registers. This is the way recommended by Intel, and will allow better Out-of-Order Execution (OOE) and Register Renaming (RR). Besides that, working with full registers rather with with their lower parts is faster on modern processors: Knights Landing and Cannonlake. So this is the code that will run faster on these processors (it will use OOE and RR):

    movzx     rax, word ptr [rbp - 2]
    movzx     r12, al
    shr       rax, 8
    mov       r13, rax
    

    As about Knights Landing and future mainstream processors like CannonLake - Intel is explicit that instructions on 8-bit and 16-bit registers would be much slower than on 32-bit or 64-bit registers on CannonLake and so they are now on Knights Landing.

    If you write with OOB and RR in mind, your assembly code will be much faster.

    0 讨论(0)
提交回复
热议问题