Are x86 Assembly Mnemonic standarized?

蓝咒 提交于 2020-12-10 05:47:42

问题


Does the x86 standard include Mnemonics or does it just define the opcodes?

If it does not include them, is there another standard for the different assemblers?


回答1:


Mnemonics are not standardised and different assemblers use different mnemonics. Some examples:

  • AT&T-style assemblers apply b, w, l, and q suffixes to all mnemonics to indicate operand size. Intel-style assemblers typically indicate this with the keywords byte, word, dword, and qword
  • AT&T-style assemblers recognise cbtw, cwtl, cltq, and cqto while Intel-style assemblers recognise the same instructions as cbw, cwd, cdq, and cqo
  • AT&T-style assemblers recognise movz?? and movs?? where ?? are two size suffixes for what Intel-style assemblers call movzx, movsx, and movsxd
  • some Intel-style assemblers only recognise 63 /r as movsxd while others recognise movsx as a variant of this instruction, too
  • Plan 9-style assemblers (such as used in Go) are just plain weird and differ in a whole lot of ways, such as using Motorola-style mnemonics for conditional jumps
  • historically, the NEC assembler provided for the NEC V20 clone of the 8086 came with almost entirely different mnemonics. For example, int was called brk.



回答2:


There unfortunately isn't really a single "x86 standard" written down on paper that defines all the minimum requirements that a CPU must meet to be an x86.

Intel's documentation comes very close to being the "x86 standard", but in some cases gives stronger guarantees on things than you get on modern AMD CPUs. e.g. Intel guarantees atomicity of a 1/2/4/8-byte load or store from/to cacheable memory with any alignment that doesn't cross a cache-line boundary. But AMD only guarantees it for cacheable loads/stores that don't cross an 8-byte boundary.

Why is integer assignment on a naturally aligned variable atomic on x86? quotes Intel's manual, showing that all of the guarantees are given as "Intel486 processor (and newer processors since)" guarantees such and such. There's no baseline given that applies to all x86 CPUs (or more importantly all x86-64 CPUs). I think the actual shared baseline in practice for x86 (including pre-x86-64) is 1 byte, because of 8088.

So software that wants to run on modern x86-64 CPUs can't assume atomicity for 8-byte loads/stores unless they're actually aligned. I think we can all agree that atomicity guarantees are an essential part of being a modern multi-core x86 CPU. Atomicity of uncached MMIO access matters even on a single core; modern Intel and AMD agree on that, but again Intel only documents it in terms of "Pentium and later processors". Implicitly "later Intel processors".


That said, Intel's documentation does define mnemonics for every opcode, and register names. AMD's documentation agrees with Intel's on all of those things.

See volume 2 of Intel's x86 Software Development Manuals. HTML extracts of just the per-instruction manual entries (without the sections that explain the notation and instruction format) can be found at https://www.felixcloutier.com/x86/index.html and https://github.com/HJLebbink/asm-dude/wiki, and various other places have older versions formatted differently.


As @fuz explains, most assemblers choose to follow this standard, but it's not required. The important part is binary compatibility, not asm source compatibility.

Intel has to assign names to instructions so it can talk about them in English in the rest of its manuals, not because they need everyone in the world to use the same asm syntax.


I'm not sure Intel's manuals even fully defines a complete asm syntax (how to indicate segment-override prefixes in an addressing mode, for example).

In some cases they do step well beyond describing which machine code does what, e.g. in the string instructions lods/stos/movs/cmps/scas (and probably ins/outs), you'll find paragraphs like this one in Intel's vol.2 manual:

At the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the MOVS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source and destination operands should be symbols that indicate the size and location of the source value and the destination, respectively. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source and destination operand symbols must specify the correct type (size) of the operands (bytes, words, or doublewords), but they do not have to specify the correct location. The locations of the source and destination operands are always specified by the DS:(E)SI and ES:(E)DI registers, which must be loaded correctly before the move string instruction is executed.

(highlighting reproduced from (an HTML extract of) the original PDF)

Some "Intel-syntax" assemblers such as NASM ignore this, and only allow the use of movs with the size as part of the mnemonic, like movsb. NASM also has syntax for indicating a segment-override prefix like fs lodsd that doesn't require operands, so this entirely avoids the possibility of using operands that indicate the wrong memory operand but still assemble.

(The string instructions only use implicit memory operands, not a ModR/M addressing mode.)

NASM: parser: instruction expected rep movs

Convert Instruction in assembly code lods and stos so NASM can compile


So yes, there are multiple flavours of Intel-syntax assembly, not to mention very different syntaxes like AT&T.

AT&T uses different mnemonics intentionally for some instructions, even splitting up some opcodes that share a mnemonic in Intel syntax into separate mnemonics, like movzb for movzx-with-a-byte-source, and movzw for the word-source version. (Normally used with a size suffix as well, like movzbl, but the l can be inferred from 32-bit destination register if you like.)

And AT&T syntax unintentionally swaps fsubr with fsub when used with two register operands, which is a syntax design bug we're stuck with. (Fortunately x87 as a whole is mostly obsolete.)



来源:https://stackoverflow.com/questions/54369684/are-x86-assembly-mnemonic-standarized

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!