How are hex sequence translated to assembly without ambiguity?

前端 未结 9 2050
忘了有多久
忘了有多久 2021-01-06 14:59
8B EC 56 8B F4 68 00 70 40 00 FF 15 BC 82 40   

A senquence like above can be segmented in various ways,each segment can be translated to correspon

9条回答
  •  有刺的猬
    2021-01-06 15:37

    First of all you have to distinguish between RISC and CISC architectures.

    In a RISC architecture you usually have instructions of the same size, so ambiguity cannot be presented. Your CPU will fetch for example 4 bytes for every instruction, and since it will have to start from somewhere (your CPU doesn't have just a sequence like the one you presented, it will have a starting point for sure) once that it has the right alignment no problem can occur.

    What happens with a CISC instruction set is essentially the same: starting from the entry point of the program it will fetch instructions accordingly to your opcodes. It doesn't need to know how to matematically distinguish ambiguities since it won't happen that it just doesn't know how long is the next instruction or where the last one finished.

    So asking how to separate every instruction is like asking how to separate every word in

    thepenisonthetable

    There's not mathematical proof but you know which letters are correct together and which ones are not meaningful. The previous sentence contains "son" but you know that it is obtained from "is on". You wouldn't be able to say so without having a meaningful phrase, but your CPU only executes meaningful programs so what's the point?

    So if the CPU could work on the previous sentence it will find the first senseful instruction "the", then "pen", "is", "on" and the "son" couldn't never be recognized anyway.

    EDIT:

    To be cleared, in CISC architectures, the only contraint you have to be sure not to have ambiguities is to avoid having an instruction that is a prefix of another. Let's assume a finite alphabet composed by letters a-z instead that hex numbers (just for practical purposes).

    If the program counter points to

    abbcbcaabdeffabd
    

    you can have that abb is a whole instruction. In that case ab wouldn't be a valid instruction, otherwise the CPU couldn't know where to stop, at the same time abbc can't be an instruction too or it may create problems. Keeping it on you can have for example that ca is the next instruction, c couldn't and cbc neither.

    You can extend this argumentation to the whole string. You will see that, if the CPU finds itself in a state in which the next byte of the binary points to the FIRST byte of an instruction, and there are no instruction that are prefixes of other instruction, then in the next state the program counter will point to the FIRST byte of the next, correct, instruction.

提交回复
热议问题