At what point in the loop does integer overflow become undefined behavior?

后端 未结 12 2117
南方客
南方客 2020-12-07 18:12

This is an example to illustrate my question which involves some much more complicated code that I can\'t post here.

#include 
int main()
{
           


        
相关标签:
12条回答
  • 2020-12-07 18:45

    Since this question is dual tagged C and C++ I will try and address both. C and C++ take different approaches here.

    In C the implementation must be able to prove the undefined behavior will be invoked in order to treat the whole program as-if it had undefined behavior. In the OPs example it would seem trivial for the compiler to prove that and therefore it is as-if the whole program was undefined.

    We can see this from Defect Report 109 which at its crux asks:

    If however the C Standard recognizes the separate existence of "undefined values" (whose mere creation does not involve wholly "undefined behavior") then a person doing compiler testing could write a test case such as the following, and he/she could also expect (or possibly demand) that a conforming implementation should, at the very least, compile this code (and possibly also allow it to execute) without "failure."

    int array1[5];
    int array2[5];
    int *p1 = &array1[0];
    int *p2 = &array2[0];
    
    int foo()
    {
    int i;
    i = (p1 > p2); /* Must this be "successfully translated"? */
    1/0; /* Must this be "successfully translated"? */
    return 0;
    }
    

    So the bottom line question is this: Must the above code be "successfully translated" (whatever that means)? (See the footnote attached to subclause 5.1.1.3.)

    and the response was:

    The C Standard uses the term "indeterminately valued" not "undefined value." Use of an indeterminate valued object results in undefined behavior. The footnote to subclause 5.1.1.3 points out that an implementation is free to produce any number of diagnostics as long as a valid program is still correctly translated. If an expression whose evaulation would result in undefined behavior appears in a context where a constant expression is required, the containing program is not strictly conforming. Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming. A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation.

    In C++ the approach seems more relaxed and would suggest a program has undefined behavior regardless of whether the implementation can prove it statically or not.

    We have [intro.abstrac]p5 which says:

    A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

    0 讨论(0)
  • 2020-12-07 18:49

    Assuming int is 32-bit, undefined behavior happens at the third iteration. So if, for example, the loop was only conditionally reachable, or could conditionally be terminated before the third iteration, there would be no undefined behavior unless the third iteration is actually reached. However, in the event of undefined behavior, all output of the program is undefined, including output which is "in the past" relative to the invocation of undefined behavior. For example, in your case, this means there is no guarantee of seeing 3 "Hello" messages in the output.

    0 讨论(0)
  • 2020-12-07 18:54

    To understand why undefined behavior can 'time travel' as @TartanLlama adequately put it, let's take a look at the 'as-if' rule:

    1.9 Program execution

    1 The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

    With this, we could view the program as a 'black box' with an input and an output. The input could be user-input, files, and many other things. The output is the 'observable behavior' mentioned in the standard.

    The standard only defines a mapping between the input and the output, nothing else. It does this by describing an 'example black box', but explicitly says any other black box with the same mapping is equally valid. This means the content of the black box is irrelevant.

    With this in mind, it would not make sense to say that undefined behavior occurs at a certain moment. In the sample implementation of the black box, we could say where and when it happens, but the actual black box could be something completely different, so we can't say where and when it happens anymore. Theoretically, a compiler could for example decide to enumerate all the possible inputs, and pre-compute the resulting outputs. Then the undefined behavior would have happened during compilation.

    Undefined behavior is the inexistence of a mapping between input and output. A program can have undefined behavior for some input, but defined behavior for other. Then the mapping between input and output is simply incomplete; there is input for which no mapping to output exists.
    The program in the question has undefined behavior for any input, so the mapping is empty.

    0 讨论(0)
  • 2020-12-07 18:56

    Technically, under the C++ standard, if a program contains undefined behavior, the behavior of the entire program, even at compile time (before the program is even executed), is undefined.

    In practice, because the compiler may assume (as part of an optimization) that the overflow will not occur, at least the behavior of the program on the third iteration of the loop (assuming a 32-bit machine) will be undefined, though it is likely that you will get correct results before the third iteration. However, since the behavior of the entire program is technically undefined, there's nothing stopping the program from generating completely incorrect output (including no output), crashing at runtime at any point during execution, or even failing to compile altogether (as undefined behavior extends to compile time).

    Undefined behavior provides the compiler with more room to optimize because they eliminate certain assumptions about what the code must do. In doing so, programs that rely on assumptions involving undefined behavior are not guaranteed to work as expected. As such, you should not rely on any particular behavior that is considered undefined per the C++ standard.

    0 讨论(0)
  • 2020-12-07 19:00

    Beyond the theoretical answers, a practical observation would be that for a long time compilers have applied various transforms upon loops to reduce the amount of work done within them. For example, given:

    for (int i=0; i<n; i++)
      foo[i] = i*scale;
    

    a compiler might transform that into:

    int temp = 0;
    for (int i=0; i<n; i++)
    {
      foo[i] = temp;
      temp+=scale;
    }
    

    Thus saving a multiplication with every loop iteration. An additional form of optimization, which compilers adapted with varying degrees of aggressiveness, would turn that into:

    if (n > 0)
    {
      int temp1 = n*scale;
      int *temp2 = foo;
      do
      {
        temp1 -= scale;
        *temp2++ = temp1;
      } while(temp1);
    }
    

    Even on machines with silent wraparound on overflow, that could malfunction if there was some number less than n which, when multiplied by scale, would yield 0. It could also turn into an endless loop if scale was read from memory more than once and something changed its value unexpectedly (in any case where "scale" could change mid-loop without invoking UB, a compiler would not be allowed to perform the optimization).

    While most such optimizations would not have any trouble in cases where two short unsigned types are multiplied to yield a value which is between INT_MAX+1 and UINT_MAX, gcc has some cases where such a multiplication within a loop may cause the loop to early-exit. I haven't noticed such behaviors stemming from comparison instructions in generated code, but it is observable in cases where the compiler uses the overflow to infer that a loop can execute at most 4 or fewer times; it does not by default generate warnings in cases where some inputs would cause UB and others would not, even if its inferences cause the upper bound of the loop to be ignored.

    0 讨论(0)
  • 2020-12-07 19:01

    If you're interested in a purely theoretical answer, the C++ standard allows undefined behaviour to "time travel":

    [intro.execution]/5: A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)

    As such, if your program contains undefined behaviour, then the behaviour of your whole program is undefined.

    0 讨论(0)
提交回复
热议问题