Why the output of this program is 4
?
#include
int main()
{
short A[] = {1, 2, 3, 4, 5, 6};
std::cout << *(short*)((c
This is arguably a bug in GCC.
First, it is to be noted that your code is invoking undefined behavior, due to violation of the rules of strict aliasing.
With that said, here's why I consider it a bug:
The same expression, when first assigned to an intermediate short
or short *
, causes the expected behavior. It's only when passing the expression directly as a function argument, does the unexpected behavior manifest.
It occurs even when compiled with -O0 -fno-strict-aliasing
.
I re-wrote your code in C to eliminate the possibility of any C++ craziness. Your question is was tagged c
after all! I added the pshort
function to ensure that the variadic nature printf
wasn't involved.
#include <stdio.h>
static void pshort(short val)
{
printf("0x%hx ", val);
}
int main(void)
{
short A[] = {1, 2, 3, 4, 5, 6};
#define EXP ((short*)((char*)A + 7))
short *p = EXP;
short q = *EXP;
pshort(*p);
pshort(q);
pshort(*EXP);
printf("\n");
return 0;
}
After compiling with gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
:
gcc -O0 -fno-strict-aliasing -g -Wall -Werror endian.c
Output:
0x500 0x500 0x4
It appears that GCC is actually generating different code when the expression is used directly as an argument, even though I'm clearly using the same expression (EXP
).
Dumping with objdump -Mintel -S --no-show-raw-insn endian
:
int main(void)
{
40054d: push rbp
40054e: mov rbp,rsp
400551: sub rsp,0x20
short A[] = {1, 2, 3, 4, 5, 6};
400555: mov WORD PTR [rbp-0x16],0x1
40055b: mov WORD PTR [rbp-0x14],0x2
400561: mov WORD PTR [rbp-0x12],0x3
400567: mov WORD PTR [rbp-0x10],0x4
40056d: mov WORD PTR [rbp-0xe],0x5
400573: mov WORD PTR [rbp-0xc],0x6
#define EXP ((short*)((char*)A + 7))
short *p = EXP;
400579: lea rax,[rbp-0x16] ; [rbp-0x16] is A
40057d: add rax,0x7
400581: mov QWORD PTR [rbp-0x8],rax ; [rbp-0x08] is p
short q = *EXP;
400585: movzx eax,WORD PTR [rbp-0xf] ; [rbp-0xf] is A plus 7 bytes
400589: mov WORD PTR [rbp-0xa],ax ; [rbp-0xa] is q
pshort(*p);
40058d: mov rax,QWORD PTR [rbp-0x8] ; [rbp-0x08] is p
400591: movzx eax,WORD PTR [rax] ; *p
400594: cwde
400595: mov edi,eax
400597: call 400527 <pshort>
pshort(q);
40059c: movsx eax,WORD PTR [rbp-0xa] ; [rbp-0xa] is q
4005a0: mov edi,eax
4005a2: call 400527 <pshort>
pshort(*EXP);
4005a7: movzx eax,WORD PTR [rbp-0x10] ; [rbp-0x10] is A plus 6 bytes ********
4005ab: cwde
4005ac: mov edi,eax
4005ae: call 400527 <pshort>
printf("\n");
4005b3: mov edi,0xa
4005b8: call 400430 <putchar@plt>
return 0;
4005bd: mov eax,0x0
}
4005c2: leave
4005c3: ret
You are violating strict aliasing rules here. You can't just read half-way into an object and pretend it's an object all on its own. You can't invent hypothetical objects using byte offsets like this. GCC is perfectly within its rights to do crazy sh!t like going back in time and murdering Elvis Presley, when you hand it your program.
What you are allowed to do is inspect and manipulate the bytes that make up an arbitrary object, using a char*
. Using that privilege:
#include <iostream>
#include <algorithm>
int main()
{
short A[] = {1, 2, 3, 4, 5, 6};
short B;
std::copy(
(char*)A + 7,
(char*)A + 7 + sizeof(short),
(char*)&B
);
std::cout << std::showbase << std::hex << B << std::endl;
}
// Output: 0x500
But you can't just "make up" a non-existent object in the original collection.
Furthermore, even if you have a compiler that can be told to ignore this problem (e.g. with GCC's -fno-strict-aliasing
switch), the made-up object is not correctly aligned for any current mainstream architecture. A short
cannot legally live at that odd-numbered location in memory†, so you doubly can't pretend there is one there. There's just no way to get around how undefined the original code's behaviour is; in fact, if you pass GCC the -fsanitize=undefined
switch it will tell you as much.
† I'm simplifying a little.
The program has undefined behaviour due to casting an incorrectly aligned pointer to (short*)
. This breaks the rules in 6.3.2.3 p6 in C11, which is nothing to do with strict aliasing as claimed in other answers:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
In [expr.static.cast] p13 C++ says that converting the unaligned char*
to short*
gives an unspecified pointer value, which might be an invalid pointer, which can't be dereferenced.
The correct way to inspect the bytes is through the char*
not by casting back to short*
and pretending there is a short
at an address where a short
cannot live.