How do I interpet this x86_64 assembly opcode?

前端 未结 2 1488
小鲜肉
小鲜肉 2021-02-19 05:27

Looking at some assembly code for x86_64 on my Mac, I see the following instruction:

48 c7 c0 01 00 00 00  movq    $0x1,%rax

But nowhere can I

相关标签:
2条回答
  • 2021-02-19 05:51

    When you look at the binary

     48 c7 c0 01 00 00 00
    

    you need to disassemble it in order to understand its meaning.

    The algorithm for disassembling is not difficult, but it's complex. It supposes looking up multiple tables.

    The Algorithm is described in the 2nd volume of Intel Developer Manual,

    Intel® 64 and IA-32 Architectures
    Software Developer’s Manual
    Volume 2 (2A, 2B & 2C):
    Instruction Set Reference, A-Z
    

    You start reading from the chapter called INSTRUCTION FORMAT.

    Or, there are good books which dedicate whole chapters on this topic, such as

      X86 Instruction Set Architecture, Mindshare, by Tom  Shanley.
    

    A whole chapter is dedicated to disassembling binary X86.

    Or you can start reading the general algorithm from a manual for the same language made by AMD:

    AMD64 Architecture
    Programmer’s Manual
    Volume 3:
    General-Purpose and System Instructions
    

    Here, in the chapter Instruction Encoding you will find the automaton that defines this language of instructions, and from this graphical scheme you can write easily the decoder.

    After you do this you can come back to the Intel Manual, 2nd volume, and use it as a reference book.

    I also found useful the reverse engineering class from http://opensecuritytraining.info/. This site is created by a Phd student from CMU, most of it is't well done, but it requires longer time to study and apply it.

    After you understand the basic ideas you can look over a free project that implements the algorithm. I found useful the distorm project. At the beginning it is important not to look at abstract projects (like qemu or objdump), which try to implement dissasemblers for many languages in the same code as you will get lost. Distorm focuses only on x86 and implements it correctly and exhaustively. It conveys in formal language the definition of X86 language, while the Intel and AMD manuals define X86 language by using natural language.

    Other project that works well is udis86 .

    0 讨论(0)
  • 2021-02-19 05:53

    Actually, mov is 0xc7 there; 0x48 is, in this case, a long mode REX.W prefix.

    Answering also the question in comments: 0xc0 is b11000000. Here you can find out that with REX.B = 0 (as REX prefix is 0x48, the .B bit is unset), 0xc0 means "RAX is first operand" (in Intel syntax; mov rax, 1, RAX is first, or, in case of mov, output operand). You can find out how to read ModR/M here.

    0 讨论(0)
提交回复
热议问题