As a small recall, the x86 architecture defines 0x0F 0x1F [mod R/M]
as a multi-byte NOP.
Now I\'m looking at the specific case of an 8-byte NOP: I have
The 66H
prefix overrides the size of the operand to 16 bit.
It does not override the size of the address, if you want that you use 67H
Here's a list of all operands.
F0h = LOCK -- locks memory reads/writes
String prefixes
F3h = REP, REPE
F2h = REPNE
Segment overrides
2Eh = CS
36h = SS
3Eh = DS
26h = ES
64h = FS
65h = GS
Operand override
66h. Changes size of data expected to 16-bit
Address override
67h. Changes size of address expected to 16-bit
However it is best not to create your own nop instructions, but stick to the recommended (multi-byte) nops.
According to AMD the recommended multibytes nops are as follows:
Table 4-9. Recommended Multi-Byte Sequence of NOP Instruction
bytes sequence encoding
1 90H NOP
2 66 90H 66 NOP
3 0F 1F 00H NOP DWORD ptr [EAX]
4 0F 1F 40 00H NOP DWORD ptr [EAX + 00H]
5 0F 1F 44 00 00H NOP DWORD ptr [EAX + EAX*1 + 00H]
6 66 0F 1F 44 00 00H NOP DWORD ptr [AX + AX*1 + 00H]
7 0F 1F 80 00 00 00 00H NOP DWORD ptr [EAX + 00000000H]
8 0F 1F 84 00 00 00 00 00H NOP DWORD ptr [AX + AX*1 + 00000000H]
9 66 0F 1F 84 00 00 00 00 00H NOP DWORD ptr [AX + AX*1 + 00000000H]
Intel does not mind up to 3 redundant prefixes, so nop's up to 11 bytes can be constructed like so.
10 66 66 0F 1F 84 00 00 00 00 00H NOP DWORD ptr [AX + AX*1 + 00000000H]
11 66 66 66 0F 1F 84 00 00 00 00 00H NOP DWORD ptr [AX + AX*1 + 00000000H]
Of course you can also eliminate nops by prefixing normal instructions with redundant prefixes.
e.g.
rep mov reg,reg //one extra byte
or forcing the cpu to use longer versions of the same instruction.
test r8d,r8d is one byte longer than: test edx,edx
The instructions with immediate operands have short and long versions.
and edx,7 //short
and edx,0000007 //long
Most assembler will helpfully shorten all instructions for you, so you'll have to code the longer instructions yourself using db
Interspersing these in strategic locations can help you align jump targets without having to incur delays due to the decoding or execution of a nop.
Remember on most CPU's executing nop's still uses up resources.