问题
Does the x86 standard include Mnemonics or does it just define the opcodes?
If it does not include them, is there another standard for the different assemblers?
回答1:
Mnemonics are not standardised and different assemblers use different mnemonics. Some examples:
- AT&T-style assemblers apply
b
,w
,l
, andq
suffixes to all mnemonics to indicate operand size. Intel-style assemblers typically indicate this with the keywordsbyte
,word
,dword
, andqword
- AT&T-style assemblers recognise
cbtw
,cwtl
,cltq
, andcqto
while Intel-style assemblers recognise the same instructions ascbw
,cwd
,cdq
, andcqo
- AT&T-style assemblers recognise
movz??
andmovs??
where??
are two size suffixes for what Intel-style assemblers callmovzx
,movsx
, andmovsxd
- some Intel-style assemblers only recognise
63 /r
asmovsxd
while others recognisemovsx
as a variant of this instruction, too - Plan 9-style assemblers (such as used in Go) are just plain weird and differ in a whole lot of ways, such as using Motorola-style mnemonics for conditional jumps
- historically, the NEC assembler provided for the NEC V20 clone of the 8086 came with almost entirely different mnemonics. For example,
int
was calledbrk
.
回答2:
There unfortunately isn't really a single "x86 standard" written down on paper that defines all the minimum requirements that a CPU must meet to be an x86.
Intel's documentation comes very close to being the "x86 standard", but in some cases gives stronger guarantees on things than you get on modern AMD CPUs. e.g. Intel guarantees atomicity of a 1/2/4/8-byte load or store from/to cacheable memory with any alignment that doesn't cross a cache-line boundary. But AMD only guarantees it for cacheable loads/stores that don't cross an 8-byte boundary.
Why is integer assignment on a naturally aligned variable atomic on x86? quotes Intel's manual, showing that all of the guarantees are given as "Intel486 processor (and newer processors since)" guarantees such and such. There's no baseline given that applies to all x86 CPUs (or more importantly all x86-64 CPUs). I think the actual shared baseline in practice for x86 (including pre-x86-64) is 1 byte, because of 8088.
So software that wants to run on modern x86-64 CPUs can't assume atomicity for 8-byte loads/stores unless they're actually aligned. I think we can all agree that atomicity guarantees are an essential part of being a modern multi-core x86 CPU. Atomicity of uncached MMIO access matters even on a single core; modern Intel and AMD agree on that, but again Intel only documents it in terms of "Pentium and later processors". Implicitly "later Intel processors".
That said, Intel's documentation does define mnemonics for every opcode, and register names. AMD's documentation agrees with Intel's on all of those things.
See volume 2 of Intel's x86 Software Development Manuals. HTML extracts of just the per-instruction manual entries (without the sections that explain the notation and instruction format) can be found at https://www.felixcloutier.com/x86/index.html and https://github.com/HJLebbink/asm-dude/wiki, and various other places have older versions formatted differently.
As @fuz explains, most assemblers choose to follow this standard, but it's not required. The important part is binary compatibility, not asm source compatibility.
Intel has to assign names to instructions so it can talk about them in English in the rest of its manuals, not because they need everyone in the world to use the same asm syntax.
I'm not sure Intel's manuals even fully defines a complete asm syntax (how to indicate segment-override prefixes in an addressing mode, for example).
In some cases they do step well beyond describing which machine code does what, e.g. in the string instructions lods/stos/movs/cmps/scas (and probably ins/outs), you'll find paragraphs like this one in Intel's vol.2 manual:
At the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the MOVS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source and destination operands should be symbols that indicate the size and location of the source value and the destination, respectively. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source and destination operand symbols must specify the correct type (size) of the operands (bytes, words, or doublewords), but they do not have to specify the correct location. The locations of the source and destination operands are always specified by the DS:(E)SI and ES:(E)DI registers, which must be loaded correctly before the move string instruction is executed.
(highlighting reproduced from (an HTML extract of) the original PDF)
Some "Intel-syntax" assemblers such as NASM ignore this, and only allow the use of movs
with the size as part of the mnemonic, like movsb
. NASM also has syntax for indicating a segment-override prefix like fs lodsd
that doesn't require operands, so this entirely avoids the possibility of using operands that indicate the wrong memory operand but still assemble.
(The string instructions only use implicit memory operands, not a ModR/M addressing mode.)
NASM: parser: instruction expected rep movs
Convert Instruction in assembly code lods and stos so NASM can compile
So yes, there are multiple flavours of Intel-syntax assembly, not to mention very different syntaxes like AT&T.
AT&T uses different mnemonics intentionally for some instructions, even splitting up some opcodes that share a mnemonic in Intel syntax into separate mnemonics, like movzb
for movzx
-with-a-byte-source, and movzw
for the word-source version. (Normally used with a size suffix as well, like movzbl
, but the l
can be inferred from 32-bit destination register if you like.)
And AT&T syntax unintentionally swaps fsubr
with fsub
when used with two register operands, which is a syntax design bug we're stuck with. (Fortunately x87 as a whole is mostly obsolete.)
来源:https://stackoverflow.com/questions/54369684/are-x86-assembly-mnemonic-standarized