Is all x86 32-bit assembly code valid x86 64-bit assembly code?
I\'ve wondered whether 32-bit assembly code is a subset of 64-bit assemb
A modern x86 CPU has three main operation modes (this description is simplified):
Wikipedia has a nice table of x86-64 operating modes including legacy and real modes, and all 3 sub-modes of long mode. Under a mainstream x86-64 OS, after booting the CPU cores will always all be in long mode, switching between different sub-modes depending on 32 or 64-bit user-space. (Not counting System Management Mode interrupts...)
Now what is the difference between 16 bit, 32 bit, and 64 bit mode?
16-bit and 32-bit mode are basically the same thing except for the following differences:
Now, 64 bit mode is a somewhat different. Most instructions behave just like in 32 bit mode with the following differences:
inc reg
and dec reg
instructions are unavailable, their instruction space has been repurposed for the REX prefixes. Two-byte inc r/m
and dec r/m
is still available, so inc reg
and dec reg
can still be encoded.[disp32]
absolute address.ah
, bh
, ch
, and dh
in an instruction that requires a REX prefix. A REX prefix causes those register numbers to mean instead the low 8 bits of registers si
, di
, sp
, and bp
.fs
and gs
overrides (0x64, 0x65) which serve to support thread-local storage (TLS).push/pop seg
(except push/pop fs/gs
), arpl
, call far
(only the 0xff encoding is valid), les
, lds
, jmp far
(only the 0xff encoding is valid),daa
, das
, aaa
, aas
, aam
, aad
,bound
(rarely used), pusha
/popa
(not useful with the additional registers), salc
(undocumented),lahf
and sahf
are unavailable.And that's basically all of it!
No, while there is a large amount of overlap, 64-bit assembly code is not a superset of 32-bit assembly code and so 32-bit assembly is not in general valid in 64-bit mode.
This applies both the mnemonic assembly source (which is assembled into binary format by an assembler), as well as the binary machine code format itself.
This question covers in some detail instructions that were removed, but there are also many encoding forms whose meanings were changed.
For example, Jester in the comments gives the example of push eax
not being valid in 64-bit code. Based on this reference you can see that the 32-bit push is marked N.E. meaning not encodable. In 64-bit mode, the encoding is used to represent push rax
(an 8-byte push) instead. So the same sequence of bytes has a different meaning in 32-bit mode versus 64-bit mode.
In general, you can browse the list of instructions on that site and find many which are listed as invalid or not encodable in 64-bit.
If not, please provide a small example of 32-bit assembly code that isn't valid 64-bit assembly code and explain how the 64-bit processor executes the 32-bit assembly code.
As above, push eax
is one such example. I think what is missing is that 64-bit CPUs support directly running 32-bit binaries. They don't do it via compatibility between 32-bit and 64-bit instructions at the machine language level, but simply by having a 32-bit mode where the decoders (in particular) interpret the instruction stream as 32-bit x86 rather than x86-64, as well as the so-called long mode for running 64-bit instructions. When such 64-bit chips were first released, it was common to run a 32-bit operating system, which pretty much means the chip is permanently in this mode (never goes into 64-bit mode).
More recently, it is typical to run a 64-bit operating system, which is aware of the modes, and which will put the CPU into 32-bit mode when the user launches a 32-bit process (which are still very common: until very recently my browser was still 32-bit).
All the details and proper terminology for the modes can be found in fuz's answer, which is really the one you should read.