How can *i
and u.i
print different numbers in this code, even though i
is defined as int *i = &u.i;
? I can only assuming
This code indeed invokes UB, because you do not respect the strict aliasing rule. n1256 draft of C99 states in 6.5 Expressions §7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
Between the *i = 2;
and the printf(" *i = %d\n", *i);
only a short object is modified. With the help of the strict aliasing rule, the compiler is free to assume that the int object pointed by i
has not been changed, and it can directly use a cached value without reloading it from main memory.
It is blatantly not what a normal human being would expect, but the strict aliasing rule was precisely written to allow optimizing compilers to use cached values.
For the second print, unions are referenced in same standard in 6.2.6.1 Representations of types / General §7:
When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
So as u.s
has been stored, u.i
have taken a value unspecified by standard
But we can read later in 6.5.2.3 Structure and union members §3 note 82:
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
Although notes are not normative, they do allow better understanding of the standard. When u.s
have been stored through the *s
pointer, the bytes corresponding to a short have been changed to the 2 value. Assuming a little endian system, as 100 is smaller that the value of a short, the representation as an int should now be 2 as high order bytes were 0.
TL/DR: even if not normative, the note 82 should require that on a little endian system of the x86 or x64 families, printf("u.i = %d\n", u.i);
prints 2. But per the strict aliasing rule, the compiler is still allowed to assumed that the value pointed by i
has not changed and may print 100