I am trying to wrap my mind around pointers in Assembly.
What exactly is the difference between:
mov eax, ebx
and
mov [eax]
As has already been stated, wrapping brackets around an operand means that that operand is to be dereferenced, as if it were a pointer in C. In other words, the brackets mean that you are reading a value from (or storing a value into) that memory location, rather than reading that value directly.
So, this:
mov eax, ebx
simply copies the value in ebx
into eax
. In a pseudo-C notation, this would be: eax = ebx
.
Whereas this:
mov eax, [ebx]
dereferences the contents of ebx
and stores the pointed-to value in eax
. In a pseudo-C notation, this would be: eax = *ebx
.
Finally, this:
mov [eax], ebx
stores the value in ebx
into the memory location pointed to by eax
. Again, in pseudo-C notation: *eax = ebx
.
The registers here could also be replaced with memory operands, such as symbolic variable names. So this:
mov eax, [myVar]
dereferences the address of the variable myVar
and stores the contents of that variable in eax
, like eax = myVar
.
By contrast, this:
mov eax, myVar
stores the address of the variable myVar
into eax
, like eax = &myVar
.
At least, that's how most assemblers work. Microsoft's assembler (called MASM), and the Microsoft C/C++ compiler's inline assembly, is a bit different. It treats the above two instructions as equivalent, essentially ignoring the brackets around memory operands.
To get the address of a variable in MASM, you would use the OFFSET
keyword:
mov eax, OFFSET myVar
However, even though MASM has this forgiving syntax and allows you to be sloppy, you shouldn't. Always include the brackets when you want to dereference a variable and get its actual value. You will never get the wrong result if you write the code explicitly using the proper syntax, and it'll make it easier for others to understand. Plus, it'll force you to get into the habit of writing the code the way that other assemblers will expect it to be written, rather than relying on MASM's "do what I mean, not what I write" crutch.
Speaking of that "do what I mean, not what I write" crutch, MASM also generally allows you to get away with omitting the operand-size specifier, since it knows the size of the variable. But again, I recommend writing it for clarity and consistency. Therefore, if myVar
is an int
, you would do:
mov eax, DWORD PTR [myVar] ; eax = myVar
or
mov DWORD PTR [myVar], eax ; myVar = eax
This notation is necessary in other assemblers like NASM that are not strongly-typed and don't remember that myVar
is a DWORD
-sized memory location.
You don't need this at all when dereferencing register operands, since the name of the register indicates its size. al
and ah
are always BYTE
-sized, ax
is always WORD
-sized, eax
is always DWORD
-sized, and rax
is always QWORD
-sized. But it doesn't hurt to include it anyway, if you like, for consistency with the way you notate memory operands.
Also when I try to do
mov eax, [ebx]
I get a compile error, why is this?
Um…you shouldn't. This assembles fine for me in MSVC's inline assembly. As we have already seen, it is equivalent to:
mov eax, DWORD PTR [ebx]
and means that the memory location pointed to by ebx
will be dereferenced and that DWORD
-sized value will be loaded into eax
.
why I cant do
mov a, [eax]
Should that not make "a" a pointer to wherever eax is pointing?
No. This combination of operands is not allowed. As you can see from the documentation for the MOV instruction, there are essentially five possibilities (ignoring alternate encodings and segments):
mov register, register ; copy one register to another
mov register, memory ; load value from memory into register
mov memory, register ; store value from register into memory
mov register, immediate ; move immediate value (constant) into register
mov memory, immediate ; store immediate value (constant) in memory
Notice that there is no mov memory, memory
, which is what you were trying.
However, you can make a
point to what eax
is pointing to by simply coding:
mov DWORD PTR [a], eax
Now a
and eax
have the same value. If eax
was a pointer, then a
is now a pointer to that same memory location.
If you want to set a
to the value that eax
is pointing to, then you will need to do:
mov eax, DWORD PTR [eax] ; eax = *eax
mov DWORD PTR [a], eax ; a = eax
Of course, this clobbers the pointer and replaces it with the dereferenced value. If you don't want to lose the pointer, then you will have to use a second "scratch" register; something like:
mov edx, DWORD PTR [eax] ; edx = *eax
mov DWORD PTR [a], edx ; a = edx
I realize this is all somewhat confusing. The mov
instruction is overloaded with a large number of potential meanings in the x86 ISA. This is due to x86's roots as a CISC architecture. By contrast, modern RISC architectures do a better job of separating register-register moves, memory loads, and memory stores. x86 crams them all into a single mov
instruction. It's too late to go back and fix it now; you just have to get comfortable with the syntax, and sometimes it takes a second glance.