问题
I was looking at the different instructions in assembly and I am confused on how the lengths of different operands and opcodes are decided upon.
Is it something you ought to know from experience, or is there a way to find out which operand/operator combination takes up how many bytes?
For eg:
push %ebp ; takes up one byte
mov %esp, %ebp ; takes up two bytes
So the question is:
Upon seeing a given instruction, how can I deduce how many bytes its opcode will require?
回答1:
There's no hard and fast rule for x86 without a database as the instruction encoding is pretty complex (and the opcode itself can vary from 1 to 3 bytes). You can consult the Intel® 64 and IA-32 Architectures Software Developer’s Manual 2A document (Chapter 2: Instruction Format) to see how instructions and their operands are encoded:
回答2:
The length of the op-code is built with (at least) two criteria in mind
- the frequency of the op-code (put it on 1 byte if frequently used in programs, and if possible)
- the information necessary for the op-code to function (if you need an absolute address, the code cannot be encoded on a unique byte)
Also,
- between the initial 8088 to the latest Intel processors (3 decades) a lot of new instructions have been created, and some, while frequently appearing in programs, could not be coded on one single byte, because the whole 256 values were reserved.
Besides the link provided in another answer (that lists specifically the size of a code), see also the processors history.
回答3:
Terminology: "opcode" is the part of the instruction that selects the operation, not including operands, or non-mandatory prefixes that modify the operation (e.g. operand-size). Using "opcode" to refer to the whole instruction is incorrect, although done fairly often by some people talking about shellcode.
Is it something you ought to know from experience
With experience looking at machine code or especially experience optimizing for code-size, then yes, you'll start to remember things you've looked up repeatedly, and learn how to look at an asm line and know how long the instruction will be, without memorizing what the bytes will be.
The operand-encoding rules don't depend on the opcode, so you just have to remember the opcode lengths, and the special-case short forms that don't use a ModR/M byte to encode operands. And then separately remember the operand-encoding rules.
For me personally, I like to answer code-golf questions like this one with x86 machine code. (See also Tips for golfing in x86/x64 machine code). I write in NASM, planning / knowing how long each instruction will be, and let the assembler generate the hexdump of the actual machine code as a listing. For the short instructions that are useful for code-golf, I don't remember having been wrong about instruction length any time recently, but I'm lucky to have a good memory for details (like the x86 instruction set) that I find interesting or use a lot. (I did have to try rorx
to see how long it was.)
I don't type out the machine-code bytes myself; to do that by hand I'd have to look each instruction up in the manual. x86 doesn't have short encodings for PC-relative addressing, so finding/creating useful constants inside the machine code (that could double as data) is not a thing, so it's not generally useful for code-golf to memorize any of the numeric details of instruction encoding.
When optimizing for performance, smaller is usually better when everything else is equal, so caring about code-size and especially alignment is definitely part of performance.
or is there a way to find out which operand/operator combination takes up how many bytes?
This is well documented in manuals. Other than a few special-case 1-byte instructions, operand encoding is the same for (almost) everything.
The machine-code encoding of most x86 instructions follow this pattern (better diagram version of this from Intel in @Mehrdad's answer):
[prefixes] opcode ModR/M [extra addressing-mode bytes] [immediate]
(Instructions with no explicit operands don't have a ModR/M byte, just the opcode byte(s)).
x86 opcodes are 1 byte for most common instructions, especially instructions which have existed since 8086. Instructions added later (e.g. like bsf
and movsx
in 386) often use 2-byte opcodes with a 0f
escape byte. If you hang around on SO, you'll see a lot of questions asking about 8086 specifically (especially emu8086
); that's the main reason I know something about which instructions weren't available on 8086. If you'd rather just remember directly which instructions have 2-byte opcodes without the historical details, that's totally fine. Or just look it up every time in the manual :P
e.g. 0f b6 c0 movzx eax,al
, so 0F B6 is the opcode for mov r32, r/m8
, and C0 is the ModR/M byte that encodes eax as the destination (/r
field = 0), register direct mode for the source (top 2 bits = 11), and al
as the source register (/m
field = 0).
I'm using Intel syntax for all my examples (mnemonic dst, src1 [,src2, ...]
) because that matches what you'll find in Intel and AMD's manuals. AFAIK, there aren't any detailed instruction-encoding manuals that use AT&T syntax. I'm also using 32 or 64-bit examples even when talking about what 8086 had. Of course 8086 only had 16-bit real mode, but the same opcode and encoding is used in 64-bit mode (which is what we care about these days).
Intel's instruction set ref. manual (SDM vol.2) has opcode maps for 1, 2, 3 byte opcodes (appendix A.3), so you can see some patterns in the choice of opcode encoding. Or for any given instruction, look at the encoding listed along with the full description in that manual. (Also see some nice online extracts with one page per instruction, like https://github.com/HJLebbink/asm-dude/wiki and http://felixcloutier.com/x86/. HJ Lebbink's page tags each instruction with when it was introduced, so you can see 8086 for add
, or 386 for new forms of shifts, and for movzx
).
Note that some one-operand instructions, like shl or not
, use the /r
field of the ModR/M byte as extra opcode bits. Also most instructions with an immediate are still destructive because they use the /r
field as opcode bits. imul r32, r/m32, imm32
(386) is the exception to this rule, having an immediate and using the full ModR/M byte for both operands. (Note that the ModR/M can only signal register or memory operands; the encoding for add r/m32, imm8
uses the opcode to indicate that there's an immediate. But the main opcode byte is shared by multiple instructions, so the /r
field is used as part of the opcode, and that's why we don't have add r/m32, r32, imm8
. But for ADD / SUB we can use lea ecx, [rax + 1]
as a copy-and-add.)
Operand encoding:
Most instructions with an immediate operand are the same length as the register/memory source version, plus the bytes to encode the immediate. Immediates are either imm8 or imm32, so values from -128..127 are more compact. (In 16-bit mode it's either imm8 or imm16).
The ModR/M byte is all that's needed for register direct, or the simplest one-register addressing mode with no displacement. (Except with [esp]
). So add eax, ecx
is 2 bytes long, just like add eax, [ecx]
. Indexed addressing modes (and modes with esp
/ rsp
as the base register) need a SIB (Scale/Index/Base) byte.
Constant displacements in addressing modes need an extra 1 or 4 bytes (sign-extended disp8 or disp32) on top of the ModR/M + optional SIB.
AVX512 EVEX with disp8 scales the disp8 by the vector width, so vaddps zmm31, zmm30, [rsi + 256] is only 7 bytes (4-byte EVX + opcode=0x58 + modrm + disp8), but vaddps zmm31, zmm30, [rsi + 16]
is 11 bytes: it has to use a disp32 to encode +16
, because it's not a multiple of 64. But the same instruction with xmm
registers could use a disp8
.
See Intel's manuals for the full details.
Special short-forms of the most common instructions
To save code-size, 8086 (and later x86) provides special encodings with no ModR/M byte for some very common instructions. If the instruction isn't one of these, it uses a ModR/M byte
- add/adc/sub/cmp/test/and/or/xor/etc. AL/AX/EAX with an immediate of the same size as the register. e.g.
and eax, imm32
(5 bytes) orand al,imm8
(2 bytes). But there's no special encoding forand eax, imm8
; that still has to use the 3-byteand r/m32, imm8
encoding. Usingal
can be very good for code-size when working with 8-bit data, especially if you've avoided or aren't worried about partial-register stalls or false dependencies causing performance problems. shift/rotate with a count of 1: 8086 didn't have imm8 rotates, only by
cl
or by an implicit 1, so there are opcodes likeshl r/m32,1
where the1
is implicit.Using the
imm8
encoding has performance implications: potential stalls on P6-family because it doesn't check if the imm8 is zero until execution. But therol r32,1
short form is 2 uops, vs. 1 forrol r32, imm8
(even if the imm8 is 1) on Sandybridge-family including Skylake. Thercl r32,1
short form is much faster than with an imm8. (3 uops vs. 8 on Skylake).
And several where the register is encoded in the low 3 bits of the instruction byte, effectively dedicating 8 bytes of opcode coding space to making the register-operand form of these instructions 1 byte shorter.
mov r8, imm8
: 2 bytes instead of 3 for the generalmov r/m8, imm8
encoding.- mov r32, imm32: 5 bytes instead of 6 bytes for
mov r/m32, imm32
. Fun fact: in x86-64, the REX.W=1 version of the short-form opcode is the only instruction that can use a 64-bit immediate. 10-bytemov r64, imm64
. The REX.W=1 version of ther/m32
opcode still uses a 32-bit immediate (sign-extended like usual), somov rax, -1
is best encoded that way, taking 7 bytes vs. 5-bytemov eax,-1
. (Or if optimizing for code-size, see also Set all bits in CPU register to 1 efficiently.) - push/pop register, 1 byte vs. 2 bytes for the
pop r/m32
encoding. push
/pop
segment registers (other than FS/GS). Although there isn't a r/m16 encoding for these.- inc r32 / dec r32 (16/32-bit mode only: the 0x4X bytes are REX prefixes in x86-64, so
inc eax
has to use the 2-byteinc r/m32
encoding). xchg eax, reg: This is where
0x90 nop
comes from: the short-form ofxchg eax,eax
(or in 16-bit mode,xchg ax,ax
). In x86-64, 90nop
is not alsoxchg eax,eax
, because that would zero-extend EAX into RAX. Instead, it has its own instruction-set manual entry.xchg reg,reg
is never used by compilers, and is usually not faster than 3 mov instructions, so it would be nice if we had those 7 opcode bytes back for more useful future extensions. (Or 8 ifnop
was moved to a different opcode...). It was more useful in 8086 when the accumulator was "more special", e.g.cbw
to sign-extend AL into AX was the only (good) way becausemovsx
didn't exist. And only 1-operandmul
/imul
was available.
xchg eax, r32
is still great for code-golf, e.g. GCD in 8 bytes of x86 32-bit machine code. See also my other code-golf answers for various code-size tricks (mostly at the expense of performance; that's the point of code-golf).
I think this covers all the single-byte special cases of instructions that also have r/m32
encodings.
This answer isn't meant to be exhaustive. I haven't talked about more recent instructions much, and there are lots of special cases for rare instructions. The rules for when a REX prefix or operand-size prefix are required are pretty straightforward. Here's are some more general rules:
- SSE1/SSE3
ABCps
instructions have 2-byte opcodes (0F xx) - SSE2 integer / double-precision instructions generally have 3-byte opcodes (66 0F xx or similar)
- SSSE3/SSE4.x instructions have 4-byte opcodes (3 mandatory prefixes)
VEX-coded instructions can use a 2-byte VEX prefix if the SSE version is SSE3 or earlier, and the 2nd source register isn't a "high" register (xmm/ymm8-15). The XMM and YMM versions of the same instruction are always the same size. (But prefer xmm with implicit zero-extension instead of explicit ymm when you don't care or want the high half zeroed.)
vpxor ymm8,ymm8,ymm5 ; 2-byte VEX
vpxor ymm7,ymm7,ymm8 ; 3-byte VEX
vpxor ymm7,ymm8,ymm7 ; 2-byte VEX
So we can use "high" registers as the destination or first source without needing a 3-byte VEX, but not as the 2nd source (3rd operand overall). For commutative operations, you can save size by putting the low8 one as the 2nd source.
Note that for 4-operand instructions like vblendvps, the 4th operand is encoded in an imm8
. So it's still the 3rd operand (2nd source), not the last operand, which affects what size VEX prefix is needed. But blendvps
is SSE4.1, so it always needs a 3-byte VEX prefix anyway to represent the 66.0F3A
encoding of the prefix field.
回答4:
Usually, this isn't something you need to know from one instruction to the next when programming in assembly language. If it ever matters (such as if you're trying to fit some particular code into a constrained space), you can look at the listing output from the assembler, or a disassembly listing.
回答5:
From my 6510 assembly days, the answer usually pertained to operand addresses and offsets. Opcodes were always 1 byte for the 6510. Addresses were always two bytes. If the Opcode required one address, then I knew the total size was three bytes. If two addresses were specified, then I knew the total size was 5 bytes.
As for offsets, the space they occupied was contingent on the length of the branch. So consider this:
bne FooBar
If the "Foobar" offset was pointing to an address that was less than 128 bytes away, then the operand was a single byte. If the offset pointed to an address beyond that, then a full address was needed. A full address was no longer an offset, and of course addresses occupied two bytes.
So in this latter case, it might not be easy to tell if the opcode + operand required two or three bytes.
So I guess, sometimes you can tell and other times it isn't so obvious.
来源:https://stackoverflow.com/questions/4567903/instruction-lengths