Understanding the difference between ++i and i++ at the Assembly Level

问题

I know that variations of this question has been asked here multiple times, but I'm not asking what is the difference between the two. Just would like some help understanding the assembly behind both forms.

I think my question is more related to the whys than to the what of the difference.

I'm reading Prata's C Primer Plus and in the part dealing with the increment operator ++ and the difference between using i++ or ++i the author says that if the operator is used by itself, such as ego++; it doesn't matter which form we use.

If we look at the dissasembly of the following code (compiled with Xcode, Apple LLVM version 9.0.0 (clang-900.0.39.2)):

int main(void)
{
    int a = 1, b = 1;

    a++;
    ++b;

    return 0;

}

we can see that indeed the form used doesn't matter, since the assembly code is the same for both (both variables would print out a 2 to the screen).

Initializaton of a and b:

0x100000f8d <+13>: movl   $0x1, -0x8(%rbp)
0x100000f94 <+20>: movl   $0x1, -0xc(%rbp)

Assembly for a++:

0x100000f9b <+27>: movl   -0x8(%rbp), %ecx
0x100000f9e <+30>: addl   $0x1, %ecx
0x100000fa1 <+33>: movl   %ecx, -0x8(%rbp)

Assembly for ++b:

0x100000fa4 <+36>: movl   -0xc(%rbp), %ecx 
0x100000fa7 <+39>: addl   $0x1, %ecx 
0x100000faa <+42>: movl   %ecx, -0xc(%rbp)

Then the author states that when the operator and its operand are part of a larger expression as, for example, in an assignment statement the use of prefix or postfix it does make a difference.

For example:

int main(void)
{
    int a = 1, b = 1;
    int c, d;

    c = a++;
    d = ++b;

    return 0;

}

This would print 1 and 2 for c and b, respectively.

And:

Initialization of a and b:

0x100000f46 <+22>: movl   $0x1, -0x8(%rbp)
0x100000f4d <+29>: movl   $0x1, -0xc(%rbp)

Assembly for c = a++; :

0x100000f54 <+36>: movl   -0x8(%rbp), %eax      // eax = a = 1
0x100000f57 <+39>: movl   %eax, %ecx            // ecx = 1
0x100000f59 <+41>: addl   $0x1, %ecx            // ecx = 2
0x100000f5c <+44>: movl   %ecx, -0x8(%rbp)      // a = 2
0x100000f5f <+47>: movl   %eax, -0x10(%rbp)     // c = eax = 1

Assembly for d = ++b; :

0x100000f62 <+50>: movl   -0xc(%rbp), %eax      // eax = b = 1
0x100000f65 <+53>: addl   $0x1, %eax            // eax = 2
0x100000f68 <+56>: movl   %eax, -0xc(%rbp)      // b = eax = 2
0x100000f6b <+59>: movl   %eax, -0x14(%rbp)     // d = eax = 2

Clearly the assembly code is different for the assignments:

The form c = a++; includes the use of the registers eax and ecx. It uses ecx for performing the increment of a by 1, but uses eax for the assignment.
The form d = ++b; uses ecx for both the increment of b by 1 and the assignment.

My question is:

Why is that?
What determines that c = a++; requires two registers instead of just one (ecx for example)?

回答1:

In the following statements:

a++;
++b;

neither of the evaluation of the expressions a++ and ++b is used. Here the compiler is actually only interested in the side effects of these operators (i.e.: incrementing the operand by one). In this context, both operators behave in the same way. So, it's no wonder that these statements result in the same assembly code.

However, in the following statements:

c = a++;
d = ++b;

the evaluation of the expressions a++ and ++b is relevant to the compiler because they have to be stored in c and d, respectively:

d = ++b;: b is incremented and the result of this increment assigned to d.
c = a++; : the value of a is first assigned to c and then a is incremented.

Therefore, these operators behave differently in this context. So, it would make sense to result in different assembly code, at least in the beginning, without more aggressive optimizations enabled.

回答2:

A good compiler would replace this whole code with c = 1; d = 2;. And if those variables aren't used in turn, the whole program is one big NOP - there should be no machine code generated at all.

But you do get machine code, so you are not enabling the optimizer correctly. Discussing the efficiency of non-optimized C code is quite pointless.

Discussing a particular compiler's failure to optimize the code might be meaningful, if a specific compiler is mentioned. Which isn't the case here.

All this code shows is that your compiler isn't doing a good job, possibly because you didn't enable optimizations, and that's it. No other conclusions can be made. In particular, no meaningful discussion about the behavior of i++ versus ++i is possible.

回答3:

Your test has flaws : the compiler optimized your code by replacing your value with what could be easily predicted.

The compiler can, and will, calculate the result in advance during compilation and avoid the use of 'jmp' instructions (jump to the the while each time condition is still true).

If you try this code:

int a = 0;
int i = 0;

while (i++ < 10)
{
    a += i;
}

The assembly will not use a single jmp instruction.

It will directly assign value of ½ n (n + 1), here (0.5 * 10 * 6) = 30 to the register holding the value of 'a' variable

You would have the following assembly output:

mov eax, 30 ; a register
mov ecx, 10 ; i register, this line only if i is still used after.

Whether you write :

int i = 0;
while (i++ < 10)
{
    ...
}

int i = -1;
while (++i < 11)
{
    ...
}

will also result in the same assembly output.

If you had a much more complex code you would be able to witness differences in the assembly code.

 a = ++i;

would translate into :

inc rcx          ; increase i by 1, RCX holds the current value of both and i variables.

~~mov rax, rcx ; a = i;~~

and a = i++; into :

lea rax, [rcx+1] ; RAX now holds i, RCX now holds a.

~~mov rax, rcx ; a = i;~~

~~inc rcx ; increase i by 1~~

(edit: See comment below)

回答4:

Both the expressions ++i and i++ have the effect of incrementing i. The difference is that ++i produces a result (a value stored somewhere, for example in a machine register, that can be used within other expressions) equal to the new value of i, whereas i++ produces a result equal to the original value of i.

So, assuming we start with i having a value of 2, the statement

 b = ++i;

has the effect of setting both b and i equal to 3, whereas;

 b = i++;

has the effect of setting b equal to 2 and i equal to 3.

In the first case, there is no need to keep track of the original value of i after incrementing i whereas in the second there is. One way of doing this is for the compiler to employ an additional register for i++ compared with ++i.

This is not needed for a trivial expression like

 i++;

since the compiler can immediately detect that the original value of i will not be used (i.e. is discarded).

For simple expressions like b = i++ the compiler could - in principle at least - avoid using an additional register, by simply storing the original value of i in b before incrementing i. However, in slightly more complex expressions such as

c = i++ - *p++;       //  p is a pointer

it can be much more difficult for the compiler to eliminate the need to store old and new values of i and p (unless, of course, the compiler looks ahead and determines how (or if) c, i, and p (and *p) are being used in subsequent code). In more complex expressions (involving multiple variables and interacting operations) the analysis needed can be significant.

It then comes down to implementation choices by developers/designers of the compiler. Practically, compiler vendors compete pretty heavily on compilation time (getting compilation times as small as possible) and, in doing so, may choose not to do all possible code transformations that remove unneeded uses of temporaries (or machine registers).

回答5:

You compiled with optimization disabled! For gcc and LLVM, that means each C statement is compiled independently, so you can modify variables in memory with a debugger, and even jump to a different source line. To support this, the compiler can't optimize between C statements at all, and in fact spills / reloads everything between statements.

So the major flaw in your analysis is that you're looking at an asm implementation of that statement where the inputs and outputs are memory, not registers. This is totally unrealistic: compilers keep most "hot" values in registers inside inner loops, and don't need separate copies of a value just because it's assigned to multiple C variables.

Compilers generally (and LLVM in particular, I think) transform the input program into an SSA (Static Single Assignment) internal representation. This is how they track data flow, not according to C variables. (This is why I said "hot values", not "hot variables". A loop induction variable might be totally optimized away into a pointer-increment / compare against end_pointer in a loop over arr[i++]).

c = ++i; produces one value with 2 references to it (one for c, one for i). The result can stay in a single register. If it doesn't optimize into part of some other operation, the asm implementation could be as simple as inc %ecx, with the compiler just using ecx/rcx everywhere that c or i is read before the next modification of either. If the next modification of c can't be done non-destructively (e.g. with a copy-and-modify like lea (,%rcx,4), %edx or shrx %eax, %ecx, %edx), then a mov instruction to copy the register will be emitted.

d = b++; produces one new value, and makes d a reference to the old value of b. It's syntactic sugar for d=b; b+=1;, and compiles into SSA the same as that would. x86 has a copy-and-add instruction, called lea. The compiler doesn't care which register holds which value (except in loops, especially without unrolling, when the end of the loop has to have values in the right registers to jump to the beginning of the loop). But other than that, the compiler can do lea 1(%rbx), %edx to leave %ebx unmodified and make EDX hold the incremented value.

An additional minor flaw in your test is that with optimization disabled, the compiler is trying to compile quickly, not well, so it doesn't look for all possible peephole optimizations even within the statement that it does allow itself to optimize.

If the value of c or d is never read, then it's the same as if you had never done the assignment in the first place. (In un-optimized code, every value is implicitly read by the memory barrier between statements.)

What determines that c = a++; requires two registers instead of just one (ecx for example)?

The surrounding code, as always. +1 can be optimized into other operations, e.g. done with an LEA as part of a shift and/or add. Or built in to an addressing mode.

Or before/after negation, use the 2's complement identity that -x == ~x+1, and use NOT instead of NEG. (Although often you're adding the negated value to something, so it turns into a SUB instead of NEG + ADD, so there isn't a stand-alone NEG you can turn into a NOT.)

++ prefix or postfix is too simple to look at on its own; you always have to consider where the input comes from (does the incremented value have to end up back in memory right away or eventually?) and how the incremented and original values are used.

Basically, un-optimized code is un-interesting. Look at optimized code for short functions. See Matt Godbolt's talk at CppCon2017: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”, and also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler asm output.

来源：https://stackoverflow.com/questions/48497636/understanding-the-difference-between-i-and-i-at-the-assembly-level

标签

assembly

clang

micro-optimization