As an example, consider the following structure:
struct S {
int a[4];
int b[4];
} s;
Would it be legal to write s.a[6]
and
No, since accesing an array out of bounds invokes Undefined Behavior, both in C and C++.
The Standard does not impose any restrictions upon what implementations must do when a program tries to use an out-of-bounds array subscript in one structure field to access a member of another. Out-of-bounds accesses are thus "illegal" in strictly conforming programs, and programs which make use of such accesses cannot simultaneously be 100% portable and free of errors. On the other hand, many implementations do define the behavior of such code, and programs which are targeted solely at such implementations may exploit such behavior.
There are three issues with such code:
While many implementations lay out structures in predictable fashion, the Standard allows implementations to add arbitrary padding before any structure member other than the first. Code could use sizeof
or offsetof
to ensure that structure members are placed as expected, but the other two issues would remain.
Given something like:
if (structPtr->array1[x])
structPtr->array2[y]++;
return structPtr->array1[x];
it would normally be useful for a compiler to assume that the use of structPtr->array1[x]
will yield the same value as the preceding use in the "if" condition, even though it would change the behavior of code that relies upon aliasing between the two arrays.
If array1[]
has e.g. 4 elements, a compiler given something like:
if (x < 4) foo(x);
structPtr->array1[x]=1;
might conclude that since there would be no defined cases where x
isn't less than 4, it could call foo(x)
unconditionally.
Unfortunately, while programs can use sizeof
or offsetof
to ensure that there aren't any surprises with struct layout, there's no way by which they can test whether compilers promise to refrain from the optimizations of types #2 or #3. Further, the Standard is a little vague about what would be meant in a case like:
struct foo {char array1[4],array2[4]; };
int test(struct foo *p, int i, int x, int y, int z)
{
if (p->array2[x])
{
((char*)p)[x]++;
((char*)(p->array1))[y]++;
p->array1[z]++;
}
return p->array2[x];
}
The Standard is pretty clear that behavior would only be defined if z is in the range 0..3, but since the type of p->array in that expression is char* (due to decay) it's not clear the cast in the access using y
would have any effect. On the other hand, since converting pointer to the first element of a struct to char*
should yield the same result as converting a struct pointer to char*
, and the converted struct pointer should be usable to access all bytes therein, it would seem the access using x
should be defined for (at minimum) x=0..7 [if the offset of array2
is greater than 4, it would affect the value of x
needed to hit members of array2
, but some value of x
could do so with defined behavior].
IMHO, a good remedy would be to define the subscript operator on array types in a fashion that does not involve pointer decay. In that case, the expressions p->array[x]
and &(p->array1[x])
could invite a compiler to assume that x
is 0..3, but p->array+x
and *(p->array+x)
would require a compiler to allow for the possibility of other values. I don't know if any compilers do that, but the Standard doesn't require it.
Apart from the answer of @rsp
(Undefined behavior for an array subscript that is out of range
) I can add that it is not legal to access b
via a
because the C language does not specify how much padding space can be between the end of area allocated for a and the start of b, so even if you can run it on a particular implementation , it is not portable.
instance of struct:
+-----------+----------------+-----------+---------------+
| array a | maybe padding | array b | maybe padding |
+-----------+----------------+-----------+---------------+
The second padding may miss as well as the alignment of struct object
is the alignment of a
which is the same as the alignment of b
but the C language also does not impose the second padding not to be there.