What I am asking about is the well known \"last member of a struct has variable length\" trick. It goes something like this:
struct T {
int len;
char
Yes, it is undefined behavior.
C Language Defect Report #051 gives a definitive answer to this question:
The idiom, while common, is not strictly conforming
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_051.html
In the C99 Rationale document the C Committee adds:
The validity of this construct has always been questionable. In the response to one Defect Report, the Committee decided that it was undefined behavior because the array p->items contains only one item, irrespective of whether the space exists.
If a compiler accepts something like
typedef struct { int len; char dat[]; };
I think it's pretty clear that it must be ready to accept a subscript on 'dat' beyond its length. On the other hand, if someone codes something like:
typedef struct { int whatever; char dat[1]; } MY_STRUCT;
and then later accesses somestruct->dat[x]; I would not think the compiler is under any obligation to use address-computation code which will work with large values of x. I think if one wanted to be really safe, the proper paradigm would be more like:
#define LARGEST_DAT_SIZE 0xF000 typedef struct { int whatever; char dat[LARGEST_DAT_SIZE]; } MY_STRUCT;
and then do a malloc of (sizeof(MYSTRUCT)-LARGEST_DAT_SIZE + desired_array_length) bytes (bearing in mind that if desired_array_length is larger than LARGEST_DAT_SIZE, the results may be undefined).
Incidentally, I think the decision to forbid zero-length arrays was an unfortunate one (some older dialects like Turbo C support it) since a zero-length array could be regarded as a sign that the compiler must generate code that will work with larger indices.