Matching the intel codes to disassembly output

问题

I'm starting to use the Intel reference page to look up and learn about the op codes (instead of asking everything on SO). I'd like to make sure that my understanding is OK and ask a few questions on the output between a basic asm program and the intel instruction codes.

Here is the program I have to compare various mov instructions into the rax-ish register (is there a better way to say "rax" and its 32- 16- and 8- bit components?):

.globl _start
_start:
    movq $1,    %rax    # move immediate into 8-byte rax (rax)
    movl $1,    %eax    # move immediate into 4-byte rax (eax)
    movw $1,    %ax     # move immediate into 2-byte rax (ax)
    movb $1,    %al     # move immediate into 1-byte rax (al)
    mov $60,    %eax
    syscall

And it disassembles as follows:

$ objdump -D file

file:     file format elf64-x86-64


Disassembly of section .text:

0000000000400078 <_start>:

  400078:   48 c7 c0 01 00 00 00    mov    $0x1,%rax
  40007f:   b8 01 00 00 00          mov    $0x1,%eax
  400084:   66 b8 01 00             mov    $0x1,%ax
  400088:   b0 01                   mov    $0x1,%al

  40008a:   b8 3c 00 00 00          mov    $0x3c,%eax
  40008f:   0f 05                   syscall

Now, matching up to the intel codes from MOV, copied here:

I am able to reconcile the following of the four instructions:

mov $0x1,%al --> b0 01
YES, intel states code is b0 [+ 1 byte for value] for 1-byte move immediate.
mov $0x1,%eax --> b8 01 00 00 00
YES, intel states code is b8 [+ 4 bytes for value] for 1-byte move immediate.
mov $0x1,%ax --> 66 b8 01 00
NO, intel states code is b8 not 66 b8.
mov $0x1,%rax48 --> c7 c0 01 00 00 00
N/A, 32 bit instructions only. Not listed.

From this, my question related to this are:

Why doesn't the mov $0x1,%ax match up?
Is there the same table for 64-bit codes, or what's the suggested way to look that up?
Finally, how do the codes adjust when the register changes? For example, if I want to move a value to %ebx or %r11 instead. How do you calculate the 'code-adjustment', as it looks like in this lookup table it only gives (I think?) the eax register for the 'register example codes'.

回答1:

You're missing the (concept of) prefix "opcodes" that change the meaning of the following instruction. Volume 2, sections 2.1.1 and 2.2.1 of the IA32 manual covers this. From 2.1.1 we get:

Operand-size override prefix is encoded using 66H (66H is also used as a mandatory prefix for some instructions).

so the 66 prefix changes the operand size from the default 32-bit to 16-bit. Thus, the mov $1,%ax (16-bit) is the same as mov $1,%eax (32-bit) with just the 66 prefix

The last case (mov $1, %rax) is actually using a different instruction

REX.W + C7 /0 io    MOV r/m64, imm32      Move imm32 sign extended to 64-bits tor/m64.

here we're moving a constant into any register instead of A -- the instruction is one byte larger but allows moving a 32-bit immed into a 64-bit register, so only needs a 4-byte constant instead of an 8-byte one (so ends up being 3 bytes smaller than the equivalent 48 b8 01 00 00 00 00 00 00 00)

来源：https://stackoverflow.com/questions/63875061/matching-the-intel-codes-to-disassembly-output

标签

assembly

x86

x86-64

intel

machine-code