If I declare a Union as:
union TestUnion
{
struct
{
unsigned int Num;
unsigned char Name[5];
}TestStruct;
unsigned char Total[7];
};
Another example of including the union with an enum to determine what is stored. I found it much more clear and to the point.
from: https://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/Structures.html
author: Dr. John T. Bell
In order to know which union field is actually stored, unions are often nested inside of structs, with an enumerated type indicating what is actually stored there. For example:
typedef struct Flight {
enum { PASSENGER, CARGO } type;
union {
int npassengers;
double tonnages; // Units are not necessarily tons.
} cargo;
} Flight;
Flight flights[ 1000 ];
flights[ 42 ].type = PASSENGER;
flights[ 42 ].cargo.npassengers = 150;
flights[ 20 ].type = CARGO;
flights[ 20 ].cargo.tonnages = 356.78;
The member to use is the one you last wrote to; the other(s) are off limits. You know which member you last wrote to, don't you? After all, it was you who wrote the program :-)
As for you secondary question: the compiler is allowed to insert 'padding bytes' in the structure to avoid unaligned accesses and make it more performant.
example of a possible distribution of bytes inside your structure Num |Name |pad - - - -|- - - - -|x x x 0 1 2 3|4 5 6 7 8|9 a b
There is no way to tell. You should have some additional flags (or other means external to your union) saying which of the union parts is really used.
You can't. That's part of the point of unions.
If you need to be able to tell, you can use something called a tagged union. Some languages have built-in support for these, but in C, you have to do it yourself. The idea is to include a tag along with the union which you can use to tell which version it is. Like:
enum TestUnionTag {NUM_NAME, TOTAL};
struct {
enum TestUnionTag tag;
union {
struct {
unsigned int Num;
unsigned char Name[5];
} TestStruct;
unsigned char Total[7];
} value;
} TestUnion;
Then in your code, you make sure you always set the tag to say how the union is being used.
About the sizeof: the struct is 12 bytes because there are 4 bytes for the int (most modern compilers have a 4-byte int, the same as a long int), then three bytes of padding and five bytes for the chars (i don't know if the padding comes before or after the chars). The padding is there so that the struct is a whole number of words long, so that everything in memory stays aligned on word boundaries. Because the struct is 12 bytes long, the union has to be 12 bytes long to hold it; the union doesn't change size according to what's in it.
First, sizeof(int)
on most architectures nowadays is going to be 4. If you want 2 you should look at short
, or int16_t
in the stdint.h
header in C99 if you want to be specific.
Second, C uses padding bytes to make sure each struct
is aligned to a word-boundary (4). So your struct looks like this:
+---+---+---+---+---+---+---+---+---+---+---+---+
| Num | N a m e | | | |
+---+---+---+---+---+---+---+---+---+---+---+---+
There's 3 bytes at the end. Otherwise, the next struct
in an array would have it's Num
field in an awkwardly-aligned place, which would make it less efficient to access.
Third, the sizeof
a union is going to be the sizeof
it's largest member. Even if all that space isn't used, sizeof
is going to return the largest result.
You need, as other answers have mentioned, some other way (like an enum
) to determine which field of your union is used.
Short answer: there is no way except by adding an enum somewhere in your struct outside the union.
enum TestUnionPart
{
TUP_STRUCT,
TUP_TOTAL
};
struct TestUnionStruct
{
enum TestUnionPart Part;
union
{
struct
{
unsigned int Num;
unsigned char Name[5];
} TestStruct;
unsigned char Total[7];
} TestUnion;
};
Now you'll need to control creation of your union to make sure the enum is correctly set, for example with functions similar to:
void init_with_struct(struct TestUnionStruct* tus, struct TestStruct const * ts)
{
tus->Part = TUP_STRUCT;
memcpy(&tus->TestUnion.TestStruct, ts, sizeof(*ts));
}
Dispatch on the correct values is now a single switch:
void print(struct TestUnionStruct const * tus)
{
switch (tus->Part)
{
case TUP_STRUCT:
printf("Num = %u, Name = %s\n",
tus->TestUnion.TestStruct.Num,
tus->TestUnion.TestStruct.Name);
break;
case TUP_TOTAL:
printf("Total = %s\n", tus->TestUnion.Total);
break;
default:
/* Compiler can't make sure you'll never reach this case */
assert(0);
}
}
As a side note, I'd like to mention that these constructs are best handled in languages of the ML family.
type test_struct = { num: int; name: string }
type test_union = Struct of test_struct | Total of string