How are hex sequence translated to assembly without ambiguity?

前端 未结 9 2059
忘了有多久
忘了有多久 2021-01-06 14:59
8B EC 56 8B F4 68 00 70 40 00 FF 15 BC 82 40   

A senquence like above can be segmented in various ways,each segment can be translated to correspon

9条回答
  •  鱼传尺愫
    2021-01-06 15:47

    If I understand your question correctly, you're trying to understand why

    8B EC 56 8B F4 68 00 70 40 00 FF 15 BC 82 40

    Could be split e.g. as

    8BEC 568BF4 68007040 00FF 15BC 8240

    Rathern than say,

    8B EC568B F4 68007040 00FF 15BC 8240

    This is entirely specified by the ISA of your architecture. That document describes exactly how instructions are uniquely constructed from a series of bytes.

    For the ISA to be well formed, a single series of bytes can correspond to at most a single series of decoded instructions (might be less, if there are invalid instructions).

    To get a bit more concrete, lets take the x86 example: If you want to know what each byte corresponds to, have a look here.

    You'll see that, e.g. an instruction starting with 00 is an add (additional parameters are in the next byte, with a specific encoding).

    You'll also see that some values are actually prefixes that modify the following instruction (0F - prefix to extend the opcode space, 26, 2E, 36, 3E, 64, 65, 66, 67, F0, F2, F3), and that some of them take different meaning based on the exact following instruction. Those are not opcodes, but they can alter the encoding of the arguments of the opcode, or introduce a completely new opcode space (e.g. SSE uses 0F).

    Overall, the x86 encoding is very complex, thanks for disassemblers.

提交回复
热议问题