X86 Assembly - How to calculate instruction opcodes length in bytes [closed]

前端未结

关注

 1  1025

轻奢々

相关标签:

1条回答

醉话见心

2020-12-02 03:38
So, since this topic seems to interest you, let me give you an overview. An x86 instruction comprises up to five parts and is up to 15 bytes long:
```
prefixes opcode operand displacement immediate
```
It is possible to generate encodings that are longer than 15 bytes, but the CPU rejects them. All five parts except for the opcode are optional. You can find their length as follows:
- an instruction can have any number of legacy prefixes. These are: f0 lock, f2 repne, f3 repe, 2e cs, 36 ss, 3e ds, 26 es, 64 fs, 65 gs, 66 operand size override, and 67 address size override. However, only one of f0, f2, f3 and only one of 26, 2e, 36, 3e, 64, and 65 is recognized at a time. If more than one prefix from each group is provided, CPUs behave differently. VEX and EVEX encoded instructions may only have the segment override and address size override legacy prefixes as the other prefixes are subsumed under the VEX and EVEX prefixes.
- In long mode (and only there), an instruction may have a REX prefix immediately after all legacy prefixes. The REX prefix is one of 40 to 4f. In other modes, these bytes are instructions, not prefixes and your decoder must account for that. As with legacy prefixes, a VEX or EVEX encoded instruction cannot have a REX prefix.
- The bytes c4 and c5 can introduce a VEX prefix used to encode some modern instructions. In long mode, they always do, but in other modes, you have to check the byte afterwards: Interprete it as a modr/m byte, if it encodes an r,r operand pair, it's a VEX prefix, otherwise its the opcode for les or lds. A VEX prefix beginning with c4 is two bytes long, with c5 it's three bytes. The VEX prefix also encodes the 0f, 0f 38 and 0f 3a opcode prefixes which are omitted in a VEX encoded instruction. Note that generally, using a VEX prefix is not optional. For example, pdep is encoded as VEX.NDS.LZ.F2.0F38.W0 F5 /r (e.g. c4 e2 7b f5 c0 for pdep eax,eax,eax) but the corresponding legacy instruction f2 0f 38 f5 r/m32 (e.g. f2 0f 38 f5 c0 for pdep eax,eax) is invalid. Note that the same opcode can exist with a VEX prefix and without and the two can mean different things. For example, 0f 77 is emms but VEX.128.0F.WIG 77 (i.e. c5 f8 77) is vzeroupper.
- The byte 62 introduces an EVEX prefix which is used to encode AVX512 instructions. Similar to the VEX prefix, the next few bytes need to be checked to distinguish an EVEX prefix from the bound instruction. The EVEX prefix is always four bytes long and encodes part of the opcode just as the VEX prefix does.
After the prefixes, the opcode follows. Originally, the opcode was always a single byte but then they ran out of space, so now it's either a single byte or a single byte prefixed by 0f, 0f 38, or 0f 3a. These prefixes are absent if the instruction is VEX encoded. Note that some prefixes may change what instruction is encoded. For example, opcode 0f b8 is jmpe (Enter IA-64 mode) but f3 0f b8 is not repe jmpe but rather popcnt.

The opcode and the prefixes decide which instruction is encoded. From here on, it's mostly smooth sailing. Depending on the instruction, a modr/m byte may follow. Depending on the modr/m byte and the address override prefix, a sib byte and one, two, or four displacement bytes may follow. Finally, depending on the instruction, the operand size override prefix, and the REX prefix, one, two, four, six, or eight immediate bytes may follow.

That's about as much of a description as I can give in the scope of a Stack Overflow answer. So TL;DR: It's really complicated.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题