(Why) is using an uninitialized variable undefined behavior?

后端未结

关注

 7  2386

暗喜 2020-11-21 05:01

If I have:

unsigned int x;
x -= x;

it\'s clear that x should be zero after this expression, but everywhere I look, th

7条回答

臣服心动 (楼主)

2020-11-21 05:16
The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.

Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2^n-1-1) and 2^n-1-1 as a trap representation. See Jens Gustedt's answer.

Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
```
unsigned x, y, z;   /* 0 */
y = 0;              /* 1 */
z = 4;              /* 2 */
x = - x;            /* 3 */
y = y + z;          /* 4 */
x = y + 1;          /* 5 */
```
When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
```
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
```
The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.

For a more elaborate example, consider the following code fragment:
```
unsigned i, x;
for (i = 0; i < 10; i++) {
    x = (condition() ? some_value() : -x);
}
```
Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written
```
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
    x = (condition() ? some_value() : -x);
}
```
The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
- during the first loop iteration, x is uninitialized by the time -x is evaluated.
- -x has undefined behavior, so its value is whatever-is-convenient.
- The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.
When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.

I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)

This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...