Undefined behavior from pointer math on a C++ array

前端 未结 3 1134
逝去的感伤
逝去的感伤 2021-02-05 06:36

Why the output of this program is 4?

#include 

int main()
{
    short A[] = {1, 2, 3, 4, 5, 6};
    std::cout << *(short*)((c         


        
相关标签:
3条回答
  • 2021-02-05 07:09

    This is arguably a bug in GCC.

    First, it is to be noted that your code is invoking undefined behavior, due to violation of the rules of strict aliasing.

    With that said, here's why I consider it a bug:

    1. The same expression, when first assigned to an intermediate short or short *, causes the expected behavior. It's only when passing the expression directly as a function argument, does the unexpected behavior manifest.

    2. It occurs even when compiled with -O0 -fno-strict-aliasing.

    I re-wrote your code in C to eliminate the possibility of any C++ craziness. Your question is was tagged c after all! I added the pshort function to ensure that the variadic nature printf wasn't involved.

    #include <stdio.h>
    
    static void pshort(short val)
    {
        printf("0x%hx ", val);
    }
    
    int main(void)
    {
        short A[] = {1, 2, 3, 4, 5, 6};
    
    #define EXP ((short*)((char*)A + 7))
    
        short *p = EXP;
        short q = *EXP;
    
        pshort(*p);
        pshort(q);
        pshort(*EXP);
        printf("\n");
    
        return 0;
    }
    

    After compiling with gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2):

    gcc -O0 -fno-strict-aliasing -g -Wall -Werror  endian.c
    

    Output:

    0x500 0x500 0x4
    

    It appears that GCC is actually generating different code when the expression is used directly as an argument, even though I'm clearly using the same expression (EXP).

    Dumping with objdump -Mintel -S --no-show-raw-insn endian:

    int main(void)
    {
      40054d:   push   rbp
      40054e:   mov    rbp,rsp
      400551:   sub    rsp,0x20
        short A[] = {1, 2, 3, 4, 5, 6};
      400555:   mov    WORD PTR [rbp-0x16],0x1
      40055b:   mov    WORD PTR [rbp-0x14],0x2
      400561:   mov    WORD PTR [rbp-0x12],0x3
      400567:   mov    WORD PTR [rbp-0x10],0x4
      40056d:   mov    WORD PTR [rbp-0xe],0x5
      400573:   mov    WORD PTR [rbp-0xc],0x6
    
    #define EXP ((short*)((char*)A + 7))
    
        short *p = EXP;
      400579:   lea    rax,[rbp-0x16]             ; [rbp-0x16] is A
      40057d:   add    rax,0x7
      400581:   mov    QWORD PTR [rbp-0x8],rax    ; [rbp-0x08] is p
        short q = *EXP;
      400585:   movzx  eax,WORD PTR [rbp-0xf]     ; [rbp-0xf] is A plus 7 bytes
      400589:   mov    WORD PTR [rbp-0xa],ax      ; [rbp-0xa] is q
    
        pshort(*p);
      40058d:   mov    rax,QWORD PTR [rbp-0x8]    ; [rbp-0x08] is p
      400591:   movzx  eax,WORD PTR [rax]         ; *p
      400594:   cwde   
      400595:   mov    edi,eax
      400597:   call   400527 <pshort>
        pshort(q);
      40059c:   movsx  eax,WORD PTR [rbp-0xa]      ; [rbp-0xa] is q
      4005a0:   mov    edi,eax
      4005a2:   call   400527 <pshort>
        pshort(*EXP);
      4005a7:   movzx  eax,WORD PTR [rbp-0x10]    ; [rbp-0x10] is A plus 6 bytes ********
      4005ab:   cwde   
      4005ac:   mov    edi,eax
      4005ae:   call   400527 <pshort>
        printf("\n");
      4005b3:   mov    edi,0xa
      4005b8:   call   400430 <putchar@plt>
    
        return 0;
      4005bd:   mov    eax,0x0
    }
      4005c2:   leave  
      4005c3:   ret
    

    • I get the same result with GCC 4.9.4 and GCC 5.5.0 from Docker hub
    0 讨论(0)
  • 2021-02-05 07:20

    You are violating strict aliasing rules here. You can't just read half-way into an object and pretend it's an object all on its own. You can't invent hypothetical objects using byte offsets like this. GCC is perfectly within its rights to do crazy sh!t like going back in time and murdering Elvis Presley, when you hand it your program.

    What you are allowed to do is inspect and manipulate the bytes that make up an arbitrary object, using a char*. Using that privilege:

    #include <iostream>
    #include <algorithm>
    
    int main()
    {
        short A[] = {1, 2, 3, 4, 5, 6};
    
        short B;
        std::copy(
           (char*)A + 7,
           (char*)A + 7 + sizeof(short),
           (char*)&B
        );
        std::cout << std::showbase << std::hex << B << std::endl;
    }
    
    // Output: 0x500
    

    (live demo)

    But you can't just "make up" a non-existent object in the original collection.

    Furthermore, even if you have a compiler that can be told to ignore this problem (e.g. with GCC's -fno-strict-aliasing switch), the made-up object is not correctly aligned for any current mainstream architecture. A short cannot legally live at that odd-numbered location in memory, so you doubly can't pretend there is one there. There's just no way to get around how undefined the original code's behaviour is; in fact, if you pass GCC the -fsanitize=undefined switch it will tell you as much.

    I'm simplifying a little.

    0 讨论(0)
  • 2021-02-05 07:23

    The program has undefined behaviour due to casting an incorrectly aligned pointer to (short*). This breaks the rules in 6.3.2.3 p6 in C11, which is nothing to do with strict aliasing as claimed in other answers:

    A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

    In [expr.static.cast] p13 C++ says that converting the unaligned char* to short* gives an unspecified pointer value, which might be an invalid pointer, which can't be dereferenced.

    The correct way to inspect the bytes is through the char* not by casting back to short* and pretending there is a short at an address where a short cannot live.

    0 讨论(0)
提交回复
热议问题