Consider the classical sequence point example:
i = i++;
The C and C++ standards state that the behavior of the above expression is undefined be
All operators produce a result. In addition, some operators, such as assignment operator =
and compound assignment operators (+=
, ++
, >>=
, etc.) produce side effects. The distinction between results and side effects is at the heart of this question.
Operator precedence governs the order in which operators are applied to produce their results. For instance, precedence rules require that *
goes before +
, +
goes before &
, and so on.
However, operator precedence says nothing about applying side effects. This is where sequence points (sequenced before, sequenced after, etc.) come into play. They say that in order for an expression to be well-defined, the application of side effects to the same location in memory must be separated by a sequence point.
This rule is broken by i = i++
, because both ++
and =
apply their side effects to the same variable i
. First, ++
goes, because it has higher precedence. It computes its value by taking i
's original value prior to the increment. Then =
goes, because it has lower precedence. Its result is also the original value of i
.
The crucial thing that is missing here is a sequence points separating side effects of the two operators. This is what makes behavior undefined.
Operator precedence and order of evaluation are two different things. Let's have a look at them one by one:
Operator precedence rule: In an expression operands bound tighter to the operators having higher precedence.
For example
int a = 5;
int b = 10;
int c = 2;
int d;
d = a + b * c;
In the expression a + b * c
, precedence of *
is higher than that of +
and therefore, b
and c
will bind to *
and expression will be parsed as a + (b * c)
.
Order of evaluation rule: It describes how operands will be evaluated in an expression. In the statement
d = a>5 ? a : ++a;
a
is guaranteed to be evaluated before evaluation of ++b
or c
.
But for the expression a + (b * c)
, though *
has higher precedence than that of +
, it is not guaranteed that a
will be evaluated either before or after b
or c
and not even b
and c
ordered for their evaluation. Even a
, b
and c
can evaluate in any order.
The simple rule is that: operator precedence is independent from order of evaluation and vice versa.
In the expression i = i++
, higher precedence of ++
just tells the compiler to bind i
with ++
operator and that's it. It says nothing about order of evaluation of the operands or which side effect (the one by =
operator or one by ++
) should take place first. Compiler is free to do anything.
Let's rename the i
at left of assignment be il
and at the right of assignment (in the expression i++
) be ir
, then the expression be like
il = ir++ // Note that suffix l and r are used for the sake of clarity.
// Both il and ir represents the same object.
Now compiler is free to evaluate the expression il = ir++
either as
temp = ir; // i = 0
ir = ir + 1; // i = 1 side effect by ++ before assignment
il = temp; // i = 0 result is 0
or
temp = ir; // i = 0
il = temp; // i = 0 side effect by assignment before ++
ir = ir + 1; // i = 1 result is 1
resulting in two different results 0
and 1
which depends on the sequence of side effects by assignment and ++
and hence invokes UB.
Operator precedence (and associativity) state the order in which an expression is parsed and executed. However, this says nothing about the order of evaluation of the operands, which is a different term. Example:
a() + b() * c()
Operator precedence dictates that the result of b()
and the result of c()
must be multiplied before added together with the result of a()
.
However, it says nothing about the order in which these functions should be executed. The order of evaluation of each operator specifies this. Most often, the order of evaluation is unspecified (unspecified behavior), meaning that the standard lets the compiler do it in any order it likes. The compiler need not document this order nor does it need to behave consistently. The reason for this is to give compilers more freedom in expression parsing, meaning faster compilation and possibly also faster code.
In the above example, I wrote a simple test program and my compiler executed the above functions in the order a()
, b()
, c()
. The fact that the program needs to execute both b()
and c()
before it can multiply the results, doesn't mean that it must evaluate those operands in any given order.
This is where sequence points come in. It is a given point in the program where all previous evaluations (and operations) must be done. So sequence points are mostly related to order of evaluation and not so much operator precedence.
In the example above, the three operands are unsequenced in relation to each other, meaning that no sequence point dictates the order of evaluation.
Therefore it turns problematic when side effects are introduced in such unsequenced expressions. If we write i++ + i++ * i++
, then we still don't know the order in which these operands are evaluated, so we can't determine what the result will be. This is because both +
and *
have unspecified/unsequenced order of evaluation.
Had we written i++ || i++ && i++
, then the behavior would be well-defined, because the &&
and ||
specifies the order of evaluation to be left-to-right and there is a sequence point between the evaluation of the left and the right operand. Thus if(i++ || i++ && i++)
is perfectly portable and safe (although unreadable) code.
As for the expression i = i++;
, the problem here is that the =
is defined as (6.5.16):
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
This expression is actually close to be well-defined, because the text actually says that the left operand should not be updated before the right operand is computed. The problem is the very last sentence: the order of evaluation of the operands is unspecified/unsequenced.
And since the expression contains the side effect of i++
, it invokes undefined behavior, since we can't know if the operand i
or the operand i++
is evaluated first.
(There's more to it, since the standard also says that an operand should not be used twice in an expression for unrelated purposes, but that's another story.)