I've recently began studying assembly. Just wondering why this assembly is written the way it is instead of the alternative "My Assembly" I list below. It cuts out one instruction. Any ideas? Is it too rare of a case where this works? Just seems wasteful to me to move the value of 3 to eax first.
C code:
#include<stdio.h>
int main()
{
int a = 1;
int b = 3;
a = a+b;
return a;
}
Assembly:
Dump of assembler code for function main:
0x080483dc <+0>: push ebp
0x080483dd <+1>: mov ebp,esp
0x080483df <+3>: sub esp,0x10
0x080483e2 <+6>: mov DWORD PTR [ebp-0x4],0x1
0x080483e9 <+13>: mov DWORD PTR [ebp-0x8],0x3
0x080483f0 <+20>: mov eax,DWORD PTR [ebp-0x8]
0x080483f3 <+23>: add DWORD PTR [ebp-0x4],eax
0x080483f6 <+26>: mov eax,DWORD PTR [ebp-0x4]
0x080483f9 <+29>: leave
0x080483fa <+30>: ret
"My Assembly":
Dump of assembler code for function main:
0x080483dc <+0>: push ebp
0x080483dd <+1>: mov ebp,esp
0x080483df <+3>: sub esp,0x10
0x080483e2 <+6>: mov DWORD PTR [ebp-0x4],0x1
0x080483e9 <+13>: mov DWORD PTR [ebp-0x8],0x3
0x080483f0 <+20>: add DWORD PTR [ebp-0x4],DWORD PTR [ebp-0x8]
0x080483f3 <+23>: mov eax,DWORD PTR [ebp-0x4]
0x080483f6 <+26>: leave
0x080483f9 <+29>: ret
As Michael Petch has already said in a comment, the real answer is that you're looking at unoptimized code. Compilers do all sorts of…well, inefficient things in unoptimized code. Sometimes they do it for compilation speed. Optimizing takes longer than blindly translating the C code into assembly instructions, so when you want raw speed, you turn off the optimizer and use only the compiler: a relatively simple-minded instruction translator. Another reason that compilers do inefficient things in unoptimized code is to make debugging easier. For example, your IDE probably lets you set a breakpoint on each individual line of your C/C++ code. If the optimizer had turned multiple C/C++ lines into a single assembly instruction, it would be much more difficult, if not impossible, for you to set the breakpoints you wanted to set. This is why debugging optimized code is much more difficult, and often requires dropping down to the raw assembly and doing address-level debugging.
There are two dead giveaways here that tell you this is unoptimized code:
The use of the
leave
instruction, which is essentially a historical relic of the x86's CISC days. The philosophy used to be to have a bunch of instructions that did complicated things, so theenter
instruction was used at the beginning of a function to set up the stack frame, and theleave
instruction brought up the rear, tearing down the stack frame. This made programmer's job easier in block-structured languages, because you only needed to write a single instruction to accomplish multiple tasks. The problem is, since at least the 386, possibly the 286, theenter
instruction has been substantially slower than doing the same thing with simpler, separate instructions.leave
is also slower on the 386 and later, and is only useful when you're optimizing for size over speed (since it is smaller and not quite as slow asenter
).The fact that a stack frame is being set up at all! At any optimization level, a 32-bit x86 compiler won't bother to generate prologue code that sets up a stack frame. That is, it won't save the original value of the
EBP
register and it won't set theEBP
register to the location of the stack pointer (ESP
) at function entry. Instead, it will perform the "frame pointer omission" optimization (theEBP
register is called the "frame pointer"), and instead of usingEBP
-relative offsets to access the stack, it will just useESP
-relative offsets. This didn't used to be possible in 16-bit x86 code, but it works fine in 32-bit code, it just takes more bookkeeping, since the stack pointer is subject to change, but the frame pointer could be held constant. Such bookkeeping isn't nearly the problem for a computer/compiler that it would be for a human, so this is an obvious optimization.
Another problem with "your" assembly is that you've used an invalid instruction. There is no instruction* in the x86 architecture that accepts two memory operands. At most, one of the operands can be a memory location. The other operand must either be a register or an immediate.
The first-blush "optimized" version of this code would be something like:
; Allocate 8 bytes of space on the stack for our local variables, 'a' and 'b'.
sub esp, 8
; Load the values of 'a' and 'b', storing them into the allocated locations.
; (Note the use of ESP-relative offsets, rather than EBP-relative offsets.)
mov DWORD PTR [esp], 1
mov DWORD PTR [esp + 4], 3
; Load the value of 'a' into a register (EAX), and add 'b' to it.
; (Necessary because we can't do an ADD with two memory operands.)
mov eax, DWORD PTR [esp]
add eax, DWORD PTR [esp + 4]
; The result is now in EAX, which is exactly where we want it to be.
; (All x86 calling conventions return integer-sized values in EAX.)
; Clean up the stack, and return.
add esp, 8
ret
We've "optimized" the stack initialization sequence, and lost a lot of the fluff. Things look pretty good now. In fact, this is essentially the code that a compiler will generate if you were to declare the a
and b
variables volatile
. However, they're not actually volatile
in the original code, which means that we can keep them entirely in registers. This frees us from having to do any costly memory stores/loads, and means we don't have to allocate or restore stack space at all!
; Load the 'a' and 'b' values into the EAX and EDX registers, respectively.
mov eax, 1
mov edx, 3
; Add 'b' to 'a' in a single operation, since ADD works fine with
; two register operands.
add eax, edx
; Return, with result in EAX.
ret
Neat, right? This not only simplifies the code, but is actually a big performance win, since we're keeping everything in registers and never have to touch slow memory. Well, what else can we do? Remember that the ADD
instruction allows us to use a register as the destination operand and an immediate as the source operand. That means we could skip a MOV
and just do:
mov eax, 1
add eax, 3
ret
This is similar to what you would expect to see if you were, say, adding a constant 3 to a value already in memory:
add DWORD PTR [esp + 4], 3
But in this case, an optimizing compiler would never do it that way. It will actually outsmart you, realizing that you're doing an addition of compile-time constants, and go ahead and do the addition at compile time. Thus, the actual output of a compiler—and indeed, the most efficient way to write this code—would be simply:
mov eax, 4
ret
How anti-climactic. :-) The fastest code is always the code that doesn't have to execute.
*At least, not that I can think of at the moment. The x86 ISA is colossal, so it's almost inevitable that there's some dark corner of it that I can't think of where this statement is false. But it's true enough that you can rely on it as an axiom.
来源:https://stackoverflow.com/questions/43509578/more-efficient-assembly-code