In practice, why would different compilers compute different values of int x = ++i + ++i;?

问题

Consider this code:

int i = 1;
int x = ++i + ++i;

We have some guesses for what a compiler might do for this code, assuming it compiles.

both ++i return 2, resulting in x=4.
one ++i returns 2 and the other returns 3, resulting in x=5.
both ++i return 3, resulting in x=6.

To me, the second seems most likely. One of the two ++ operators is executed with i = 1, the i is incremented, and the result 2 is returned. Then the second ++ operator is executed with i = 2, the i is incremented, and the result 3 is returned. Then 2 and 3 are added together to give 5.

However, I ran this code in Visual Studio, and the result was 6. I'm trying to understand compilers better, and I'm wondering what could possibly lead to a result of 6. My only guess is that the code could be executed with some "built-in" concurrency. The two ++ operators were called, each incremented i before the other returned, and then they both returned 3. This would contradict my understanding of the call stack, and would need to be explained away.

What (reasonable) things could a C++ compiler do that would lead to a result of 4 or a result or 6?

Note

This example appeared as an example of undefined behavior in Bjarne Stroustrup's Programming: Principles and Practice using C++ (C++ 14).

See cinnamon's comment.

回答1:

The compiler takes your code, splits it into very simple instructions, and then recombines and arranges them in a way that it thinks optimal.

The code

int i = 1;
int x = ++i + ++i;

consists of the following instructions:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
5. read i as tmp2
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x

But despite this being a numbered list the way I wrote it, there are only a few ordering dependencies here: 1->2->3->4->5->10->11 and 1->6->7->8->9->10->11 must stay in their relative order. Other than that the compiler can freely reorder, and perhaps eliminate redundancy.

For example, you could order the list like this:

1. store 1 in i
2. read i as tmp1
6. read i as tmp3
3. add 1 to tmp1
7. add 1 to tmp3
4. store tmp1 in i
8. store tmp3 in i
5. read i as tmp2
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x

Why can the compiler do this? Because there's no sequencing to the side effects of the increment. But now the compiler can simplify: for example, there's a dead store in 4: the value is immediately overwritten. Also, tmp2 and tmp4 are really the same thing.

1. store 1 in i
2. read i as tmp1
6. read i as tmp3
3. add 1 to tmp1
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

And now everything to do with tmp1 is dead code: it's never used. And the re-read of i can be eliminated too:

1. store 1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
10. add tmp3 and tmp3, as tmp5
11. store tmp5 in x

Look, this code is much shorter. The optimizer is happy. The programmer is not, because i was only incremented once. Oops.

Let's look at something else the compiler can do instead: let's go back to the original version.

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
5. read i as tmp2
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x

The compiler could reorder it like this:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x

and then notice again that i is read twice, so eliminate one of them:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

That's nice, but it can go further: it can reuse tmp1:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp1
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

Then it can eliminate the re-read of i in 6:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

Now 4 is a dead store:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

and now 3 and 7 can be merged into one instruction:

1. store 1 in i
2. read i as tmp1
3+7. add 2 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

Eliminate the last temporary:

1. store 1 in i
2. read i as tmp1
3+7. add 2 to tmp1
8. store tmp1 in i
10. add tmp1 and tmp1, as tmp5
11. store tmp5 in x

And now you get the result that Visual C++ is giving you.

Note that in both optimization paths, the important order dependencies were preserved, insofar as the instructions weren't removed for doing nothing.

回答2:

While this is UB (as the OP implied), following are hypothetical ways a compiler could get the 3 results. All three would give the same correct x result if used with different int i = 1, j = 1; variables instead of one and the same i.

both ++i return 2, resulting in x=4.

int i = 1;
int i1 = i, i2 = i;   // i1 = i2 = 1
++i1;                 // i1 = 2
++i2;                 // i2 = 2
int x = i1 + i2;      // x = 4

one ++i returns 2 and the other returns 3, resulting in x=5.

int i = 1;
int i1 = ++i;           // i1 = 2
int i2 = ++i;           // i2 = 3
int x = i1 + i2;        // x = 5

both ++i return 3, resulting in x=6.

int i = 1;
int &i1 = i, &i2 = i;
++i1;                   // i = 2
++i2;                   // i = 3
int x = i1 + i2;        // x = 6

回答3:

To me, the second seems most likely.

I am going for option #4: Both ++i happen concurrently.

Newer processors move toward some interesting optimizations and parallel code evaluation, where allowed as here, is another way compilers keep making faster code. I see as a practical implementation, compilers moving toward parallelism.

I could readily see a race condition causing non-deterministic behavior or a bus fault due to same memory contention - all allowed as the coder violated the C++ contract - hence UB.

My question is: what (reasonable) things could a C++ compiler do that would lead to a result of 4 or a result or 6?

It could, but do not count in it.

Don't use ++i + ++i nor expect sensible results.

回答4:

I think that a simple and straightforward interpretation (without any bid to compiler optimizations or multithreading) would be just:

Increment i
Increment i
Add i + i

With i incremented twice, its value is 3, and when added together, the sum is 6.

For inspection, consider this as a C++ function:

int dblInc ()
{
    int i = 1;
    int x = ++i + ++i;
    return x;   
}

Now, here's the assembly code I get from compiling that function, using an old version of the GNU C++ compiler (win32, gcc version 3.4.2 (mingw-special)). There's no fancy optimizations or multithreading happening here:

__Z6dblIncv:
    push    ebp
    mov ebp, esp
    sub esp, 8
    mov DWORD PTR [ebp-4], 1
    lea eax, [ebp-4]
    inc DWORD PTR [eax]
    lea eax, [ebp-4]
    inc DWORD PTR [eax]
    mov eax, DWORD PTR [ebp-4]
    add eax, DWORD PTR [ebp-4]
    mov DWORD PTR [ebp-8], eax
    mov eax, DWORD PTR [ebp-8]
    leave
    ret

Note that local variable i is sitting on the stack in just one single place: address [ebp-4]. That location is incremented twice (in the 5th-8th lines of the assembly function; including apparently redundant loads of that address into eax). Then on the 9th-10th lines, that value is loaded into eax, and then added into eax (that is, computes the current i + i). Then it's redundantly copied to the stack and back to eax as the return value (which will obviously be 6).

It may be of interest to look at the C++ standard (here, an old one: ISO/IEC 14882:1998(E)) which says for Expressions, section 5.4:

Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.

With the footnote:

The precedence of operators is not directly specified, but it can be derived from the syntax.

Two examples of unspecified behavior are given at that point, both involving the increment operator (one of which is: i = ++i + 1).

Now, if one wished, one could: Make an integer wrapper class (like a Java Integer); overload functions operator+ and operator++ such that they return the intermediate value objects; and thereby write ++iObj + ++iObj and get it to return an object holding 5. (I haven't included full code here for the sake of brevity.)

Personally, I'd by intrigued if there was an example of a well-known compiler that did the job any other way than the sequence seen above. It seems to me like the most straightforward implementation would be to just do two assembly-code incs on the primitive type before the addition operation is performed.

回答5:

The reasonable thing that a compiler can do is Common Subexpression Elimination. This is already a common optimisation in compilers: if a subexpression like (x+1) occurs more than once in a larger expression, it only needs to be calculated once. E.g. in a/(x+1) + b*(x+1) the x+1 sub-expression can be calculated once.

Of course, the compiler has to know which sub-expressions can be optimised that way. Calling rand() twice should give two random numbers. Non-inlined function calls must therefore be exempt from CSE. As you note, there is no rule that says how two occurrences of i++ should be handled, so there's no reason to exempt them from CSE.

The result may indeed be that int x = ++i + ++i; is optimised to int __cse = i++; int x = __cse << 1. (CSE, followed by repeated strength reduction)

回答6:

In practice, you are invoking undefined behaviour. Anything can happen, not just things that you consider "reasonable", and often things do happen that you don't consider reasonable. Everything is by definition "reasonable".

A very reasonable compilation is that the compiler observes that executing a statement will invoke undefined behaviour, therefor the statement cannot be executed, therefore it is translated to an instruction that intentionally crashes your application. That is very reasonable.

Downvoter: gcc strongly disagrees with you.

回答7:

There is no reasonable thing a compiler could do to get a result of 6, but it's possible and legitimate. A result of 4 is entirely reasonable, and I'd consider a result of 5 borderline reasonable. All of them are perfectly legal.

Hey, wait! Isn't it clear what must happen? The addition needs the results of the two increments, so obviously these must happen first. And we go left to right, so... argh! If only it was so simple. Unluckily, that's not the case. We do not go left to right, and that's the problem.

Reading the memory location into two registers (or initializing them both from the same literal, optimizing out the round trip to memory) is a very reasonable thing for the compiler to do. This will effectively have the effect of there covertly being two different variables, each with a value of 2, which will finally be added to a result of 4. This is "reasonable" because it's fast and efficient, and it is in accordance with both the standard and with the code.

Similarly, the memory location could be read once (or the variable initialized from the literal) and incremented once, and a shadow copy in another register could be incremented after that, which would result in 2 and 3 being added together. This is, I would say, borderline reasonable, although perfectly legal. I deem it borderline reasonable because it isn't one or the other. It's neither the "reasonable" optimized way, nor is it the "reasonable" exactly-pedantic way. It's somewhat in the middle.

Incrementing the memory location twice (resulting in a value of 3) and then adding that value to itself for a final result of 6 is legitimate, but not quite reasonable as doing memory round trips isn't precisely efficient. Although on a processor with good store forwarding, it might as well be "reasonable" to do it, since the store should be mostly invisible...
As the compiler "knows" that it's the same location, it might as well choose to increment the value twice within a register, and then add it to itself, too. Either approach would give you the result of 6.

The compiler is, by the wording of the standard, allowed to give you any such result, although I would personally consider 6 pretty much a "fuck you" memo from the Obnoxious Department, as it is a rather unexpected thing (legal or not, trying to always give the least amount of surprises is a good thing to do!). Though, seeing how Undefined Behavior is involved, sadly one cannot really argue about "unexpected", eh.

So, actually, what is the code that you have there, to the compiler? Let's ask clang, which will show us if we ask nicely (invoking with -ast-dump -fsyntax-only):

ast.cpp:4:9: warning: multiple unsequenced modifications to 'i' [-Wunsequenced]
int x = ++i + ++i;
        ^     ~~
(some lines omitted)
`-CompoundStmt 0x2b3e628 <line:2:1, line:5:1>
  |-DeclStmt 0x2b3e4b8 <line:3:1, col:10>
  | `-VarDecl 0x2b3e430 <col:1, col:9> col:5 used i 'int' cinit
  |   `-IntegerLiteral 0x2b3e498 <col:9> 'int' 1
  `-DeclStmt 0x2b3e610 <line:4:1, col:18>
    `-VarDecl 0x2b3e4e8 <col:1, col:17> col:5 x 'int' cinit
      `-BinaryOperator 0x2b3e5f0 <col:9, col:17> 'int' '+'
        |-ImplicitCastExpr 0x2b3e5c0 <col:9, col:11> 'int' <LValueToRValue>
        | `-UnaryOperator 0x2b3e570 <col:9, col:11> 'int' lvalue prefix '++'
        |   `-DeclRefExpr 0x2b3e550 <col:11> 'int' lvalue Var 0x2b3e430 'i' 'int'
        `-ImplicitCastExpr 0x2b3e5d8 <col:15, col:17> 'int' <LValueToRValue>
          `-UnaryOperator 0x2b3e5a8 <col:15, col:17> 'int' lvalue prefix '++'
            `-DeclRefExpr 0x2b3e588 <col:17> 'int' lvalue Var 0x2b3e430 'i' 'int'

As you can see, the same lvalue Var 0x2b3e430 has prefix ++ applied at two locations, and these two are below the same node in the tree, which happens to be a very non-special operator (+) that has nothing special being said about sequencing or such. Why is this important? Well, read on.

Note the warning: "multiple unsequenced modifications to 'i'". Oh oh, that doesn't sound good. What does it mean? [basic.exec] tells us about side effects and sequencing, and it tells us (paragraph 10) that by default, unless explicitly said otherwise, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. Well, darn, that's the case with operator+ -- nothing is being said otherwise, so...

But do we care about sequenced-before, indeterminately-sequenced, or unsequenced? Who wants to know, anyway?

That same paragraph also tells us that unsequenced evaluations may overlap and that when they refer to the same memory location (that's the case!) and that one is not potentially concurrent, then the behavior is undefined. This is where it really gets ugly because that means you know nothing, and you have no guarantees about being "reasonable" whatsoever. The unreasonable thing is actually perfectly allowable and "reasonable".

回答8:

There is a rule:

Between the previous and next sequence point a scalar object must have its stored value modified at most once by the evaluation of an expression, otherwise the behavior is undefined.

Thus even x = 100 is a possible valid result.

For me the most logical result in the example is 6, because we are increasing the value of i twice and them add it to itself. It is difficult to do addition before the calculation values from both sides of "+".

But compiler developers can implement any other logic.

回答9:

It looks like ++i returns an lvalue but i++ returns an rvalue.
So this code is ok:

int i = 1;
++i = 10;
cout << i << endl;

This one is not:

int i = 1;
i++ = 10;
cout << i << endl;

The above two statements are consistent with VisualC++, GCC7.1.1, CLang and Embarcadero.
That is why your code in VisualC++ and GCC7.1.1 is similar to following one

int i = 1;
... do something there for instance: ++i; ++i; ...
int x = i + i;

When looking at disassembly, it first increments i, rewrites i. When trying to add it does the same thing, increments i and rewrites it. Then adds i to i.

I've noticed CLang and Embarcadero acts differently. So it is not consistent with the first statement, after first ++i it stores the result in an rvalue and then add it to second i++.

回答10:

#include <stdio.h>


void a1(void)
{
    int i = 1;
    int x = ++i;
    printf("i=%d\n",i);
    printf("x=%d\n",x);
    x = x + ++i;    // Here
    printf("i=%d\n",i);
    printf("x=%d\n",x);
}


void b2(void)
{
    int i = 1;
    int x = ++i;
    printf("i=%d\n",i);
    printf("x=%d\n",x);
    x = i + ++i;    // Here
    printf("i=%d\n",i);
    printf("x=%d\n",x);
}


void main(void)
{
    a1();
    // b2();
}

回答11:

I personally would never have expected a compiler to output 6 in your example. There are already good and detailed answers to your question. I will try a short version.

Basically, ++i is a 2-step process in this context:

Increment the value of i
Read the value of i

In the context of ++i + ++i the two sides of the addition may be evaluated in any order according to the standard. This means the two increments are considered independent. Also, there is no dependency between the two terms. The increment and read of i may therefore be interleaved. This gives the potential order:

Increment i for the left operand
Increment i for the right operand
Read back i for the left operand
Read back i for the right operand
Sum the two: yields 6

Now, that I think about this, 6 makes the most sense according to the standard. For a result of 4 we need a CPU which first reads i independently, then increments and writes the value back into the same location; basically a race condition. For a value of 5 we need a compiler which introduces temporaries.

But, the standard says that ++i increments the variable before returning it, i.e. before actually executing the current code line. The sum operator + needs to sum i + i after applying the increments. I would say that C++ needs to work on the variables and not on a value semantic. Hence, to me 6 makes now the most sense as it relies on semantics of the language and not the execution model of CPUs.

回答12:

well it depends on the design of the compiler.Therefore the answer will depend on the way the compiler decodes the statements.Using two different variables ++x and ++y instead to create a logic would be a better choice. note:the ouput depends on the version of latest version of language in ms visual studio if its updated.So if the rules have changed so will the output

回答13:

A very reasonable compilation is that the compiler observes that executing a statement will invoke undefined behaviour, therefore the statement cannot be ever executed, therefore it is translated to an instruction that intentionally crashes your application. That is very reasonable. After all, the compiler knows that this crash can never happen.

来源：https://stackoverflow.com/questions/62185373/in-practice-why-would-different-compilers-compute-different-values-of-int-x

标签

c++

undefined-behavior