Is this use of unions strictly conforming?

前端 未结 4 672
攒了一身酷
攒了一身酷 2021-02-02 12:47

Given the code:

struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };

static int read_s1x(struct s1 *p) { re         


        
相关标签:
4条回答
  • 2021-02-02 13:15

    It is not about conforming or not conforming - it one of the optimisation "traps". All of your data structures have been optimised out and you pass the same pointer to optimised out data so the the execution tree is reduced to simple printf of the value.

      sub rsp, 8
      mov esi, 4321
      mov edi, OFFSET FLAT:.LC0
      xor eax, eax
      call printf
      xor eax, eax
      add rsp, 8
      ret
    

    to change it you need to make this "transfer" function to be side effect prone and force the real assignments. It will force optimizer to not reduce those nodes in the execution tree:

    int test(union s1s2 *p1, union s1s2 *p2, volatile union s1s2 *p3)
    /* ....*/
    
    main:
      sub rsp, 8
      mov esi, 1234
      mov edi, OFFSET FLAT:.LC0
      xor eax, eax
      call printf
      xor eax, eax
      add rsp, 8
      ret
    

    it is quite trivial test just artificially made a bit more complicated.

    0 讨论(0)
  • 2021-02-02 13:22

    I believe that your code is conformant, and there is a flaw with the -fstrict-aliasing mode of GCC and Clang.

    I cannot find the right part of the C standard, but the same problem happens when compiling your code in C++ mode for me, and I did find the relevant passages of the C++ Standard.

    In the C++ standard, [class.union]/5 defines what happens when operator = is used on a union access expression. The C++ Standard states that when a union is involved in the member access expression of the built-in operator =, the active member of the union is changed to the member involved in the expression (if the type has a trivial constructor, but because this is C code, it does have a trivial constructor).

    Note that write_s2x cannot change the active member of the union, because a union is not involved in the assignment expression. Your code does not assume that this happens, so it's OK.

    Even if I use placement new to explicitly change which union member is active, which ought to be a hint to the compiler that the active member changed, GCC still generates code that outputs 4321.

    This looks like a bug with GCC and Clang assuming that the switching of active union member cannot happen here, because they fail to recognize the possibility of p1, p2 and p3 all pointing to the same object.

    GCC and Clang (and pretty much every other compiler) support an extension to C/C++ where you can read an inactive member of a union (getting whatever potentially garbage value as a result), but only if you do this access in a member access expression involving the union. If v1 were not the active member, read_s1x would not be defined behavior under this implementation-specific rule, because the union is not within the member access expression. But because v1 is the active member, that shouldn't matter.

    This is a complicated case, and I hope that my analysis is correct, as someone who isn't a compiler maintainer or a member of one of the committees.

    0 讨论(0)
  • 2021-02-02 13:24

    With a strict interpretation of the standard, this code might be not conforming. Let's focus on the text of the well-known §6.5p7:

    An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
    — a type compatible with the effective type of the object,
    — a qualified version of a type compatible with the effective type of the object,
    — a type that is the signed or unsigned type corresponding to the effective type of the object,
    — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    — a character type.

    (emphasis mine)

    Your functions read_s1x() and write_s2x() do the opposite of what I marked bold above in the context of your whole code. With just this paragraph, you could conclude that it's not allowed: A pointer to union s1s2 would be allowed to alias a pointer to struct s1, but not vice versa.

    This interpretation of course would mean that the code must work as intended if you "inline" these functions manually in your test(). This is indeed the case here with gcc 6.2 for i686-w64-mingw32.


    Adding two arguments in favor of the strict interpretation presented above:

    • While it's always allowed to alias any pointer with char *, a character array can't be aliased by any other type.

    • Considering the (here unrelated) §6.5.2.3p6:

      One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.

      (again emphasis mine) -- the typical interpretation is that being visible means directly in the scope of the function in question, not "somewhere in the translation unit" ... so this guarantee doesn't include a function that takes a pointer to one of the structs that's a member of the union.

    0 讨论(0)
  • 2021-02-02 13:25

    I didn't read the standard, but playing with pointers in a strict-aliasing mode (ie, using -fstrict-alising) is dangerous. See the gcc online doc:

    Pay special attention to code like this:

    union a_union {
      int i;
      double d;
    };
    
    int f() {
      union a_union t;
      t.d = 3.0;
      return t.i;
    }
    

    The practice of reading from a different union member than the one most recently written to (called type-punning) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected. See Structures unions enumerations and bit-fields implementation. However, this code might not:

    int f() {
       union a_union t;
       int* ip;
       t.d = 3.0;
       ip = &t.i;
       return *ip;
    }
    

    Similarly, access by taking the address, casting the resulting pointer and dereferencing the result has undefined behavior, even if the cast uses a union type, e.g.:

    int f() {
      double d = 3.0;
      return ((union a_union *) &d)->i;
    }
    

    The -fstrict-aliasing option is enabled at levels -O2, -O3, -Os.

    Found anything similar in the second example huh?

    0 讨论(0)
提交回复
热议问题