union test
{
int i;
char ch;
}t;
int main()
{
t.ch=20;
}
Suppose sizeof(int)==2
and let the memory addresses allocated for t are 20
The C99 standard (§6.7.2.1.14) says:
The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa.
(emphasis added)
The bold statement actually says that each member of the union has the same address, so they all "begin" at the same address. t
, as t.ch
as t.i
, should be at address 2000, thus t.ch
overlaps with the first byte (in address order) of t.i
.
What this means in terms of "what do I get if I try to read t.i
after setting t.c
" in the real world depends on platform endianness, and in facts trying to read a member of a union when you wrote in another one is Unspecified Behavior according to the C standard (§6.2.6.1.6/7, restated at §J.1.1).
What helps more to understand the endianness of the machine (at least, I think it's more straightforward to understand) is to have a union like this:
union
{
int i;
unsigned char ch[sizeof(i)];
} t;
doing
t.i=20;
and then looking what's inside the two chars at t.ch
. If you are on a little-endian machine you'll get t.ch[0]==20
and t.ch[1]==0
, and the opposite if you're on a big-endian machine (if sizeof(int)==2
). Notice that, as already said, this is an implementation specific detail, the standard does not even mention endianness.
To make it even clearer: if you have a 2-byte int
var set to 20, on a little-endian machine, dumping the memory associated to it in address-order, you'll get (in hexadecimal representation, bytes split by space):
14 00
while on a big-endian machine you'll get
00 14
The big-endian representation looks "more right" from our point of view, because in the little endian representation the bytes that make the whole int
are stored in reverse order.
Moreover I am saying that if I do this:
int a=20;
printf("%d",* (char*)&a);
Then doesn't the output depend on endian-ness i.e. whether 20 is stored at 2000 or 2001 ?
Yes, here it does, but in your question you're asking another thing; this looks more my example.
test would take two bytes, and so would be allocated at address 2000, 2002, etc. And any value for each instance of the union would be stored starting at that base address.
Each member of the union would be stored at the same address for that instance of the union. That's why you can only store one type of value in a union at the same time. Therefore, unions occupy the number of bytes required for the largest member.