Why is i = v[i++] undefined?

后端 未结 8 1484
生来不讨喜
生来不讨喜 2021-02-01 13:41

From the C++ (C++11) standard, §1.9.15 which discusses ordering of evaluation, is the following code example:

void g(int i, int* v) {
    i = v[i++]; // the beha         


        
相关标签:
8条回答
  • 2021-02-01 14:11

    You specifically refer to the C++11 standard so I'm going to answer with the C++11 answer. It is, however, very similar to the C++03 answer, but the definition of sequencing is different.

    C++11 defines a sequenced before relation between evaluations on a single thread. It is asymmetric, transitive and pair-wise. If some evaluation A is not sequenced before some evaluation B and B is also not sequenced before A, then the two evaluations are unsequenced.

    Evaluating an expression includes both value computations (working out the value of some expression) and side effects. One instance of a side effect is the modification of an object, which is the most important one for answering question. Other things also count as side effects. If a side effect is unsequenced relative to another side effect or value computation on the same object, then your program has undefined behaviour.

    So that's the set up. The first important rule is:

    Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

    So any full expression is fully evaluated before the next full expression. In your question, we're only dealing with one full expression, namely i = v[i++], so we don't need to worry about this. The next important rule is:

    Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.

    That means that in a + b, for example, the evaluation of a and b are unsequenced (they may be evaluated in any order). Now for our final important rule:

    The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.

    So for a + b, the sequenced before relationships can be represented by a tree where a directed arrow represents the sequenced before relationship:

    a + b (value computation)
    ^   ^
    |   |
    a   b (value computation)
    

    If two evaluations occur in separate branches of the tree, they are unsequenced, so this tree shows that the evaluations of a and b are unsequenced relative to each other.

    Now, let's do the same thing to your i = v[i++] example. We make use of the fact that v[i++] is defined to be equivalent to *(v + (i++)). We also use some extra knowledge about the sequencing of postfix increment:

    The value computation of the ++ expression is sequenced before the modification of the operand object.

    So here we go (a node of the tree is a value computation unless specified as a side effect):

    i = v[i++]
    ^     ^
    |     |
    i★  v[i++] = *(v + (i++))
                      ^
                      |
                   v + (i++)
                   ^     ^
                   |     |
                   v     ++ (side effect on i)★
                         ^
                         |
                         i
    

    Here you can see that the side effect on i, i++, is in a separate branch to the usage of i in front of the assignment operator (I marked each of these evaluations with a ★). So we definitely have undefined behaviour! I highly recommend drawing these diagrams if you ever wonder if your sequencing of evaluations is going to cause you trouble.

    So now we get the question about the fact that the value of i before the assignment operator doesn't matter, because we write over it anyway. But actually, in the general case, that's not true. We can override the assignment operator and make use of the value of the object before the assignment. The standard doesn't care that we don't use that value - the rules are defined such that having any value computation unsequenced with a side effect will be undefined behaviour. No buts. This undefined behaviour is there to allow the compiler to emit more optimized code. If we add sequencing for the assignment operator, this optimization cannot be employed.

    0 讨论(0)
  • 2021-02-01 14:15

    Think about the sequences of machine operations necessary for each of the following assignment statements, assuming the given declarations are in effect:

    extern int *foo(void);
    extern int *p;
    
    *p = *foo();
    *foo() = *p;
    

    If the evaluation of the subscript on the left side and the value on the right side are unsequenced, the most efficient ways to process the two function calls would likely be something like:

    [For *p = *foo()]
    call foo (which yields result in r0 and trashes r1)
    load r0 from address held in r0
    load r1 from address held in p
    store r0 to address held in r1
    
    [For *foo() = *p]
    call foo (which yields result in r0 and trashes r1)
    load r1 from address held in p
    load r1 from address held in r1
    store r1 to address held in r0
    

    In either case, if p or *p were read into a register before the call to foo, then unless "foo" promises not to disturb that register, the compiler would need to add an extra step to save its value before calling "foo", and another extra step to restore the value afterward. That extra step might be avoided by using a register that "foo" won't disturb, but that would only help if there were a such a register which didn't hold a value needed by the surrounding code.

    Letting the compiler read the value of "p" before or after the function call, at its leisure, will allow both patterns above to be handled efficiently. Requiring that the address of the left-hand operand of "=" always be evaluated before the right hand side would likely make the first assignment above less efficient than it otherwise could be, and requiring that the address of the left-hand operand be evaluated after the right-hand side would make the second assignment less efficient.

    0 讨论(0)
提交回复
热议问题