How to type-pun in C | 易学教程

问题

Follow-up to extended discussion in Casting behavior in C

I'm trying to emulate a Z80 in C, where several 8-bit registers can be combined to create 16-bit registers.

This is the logic I'm trying to use:

struct {
    uint8_t b;
    uint8_t c;
    uint16_t *bc;
} regs[1];
...
regs->bc = (uint16_t *)&(regs->b);

Why is this incorrect, and how can I do it correctly (using type-punning if needed)?

I need to do this multiple times, preferably within the same structure.

For those of you that I haven't mentioned this to: I understand that this assumes a little-endian architecture. I have this handled completely.

回答1:

It's incorrect because b is of type uint8_t and a pointer to uint16_t cannot be used for accessing such a variable. It might not be correctly aligned and it is a strict aliasing violation.

You are however free to do (uint8_t *)&regs or (struct reg_t*)&regs->b, since (6.7.2.1/15)

A pointer to a structure object, suitably converted, points to its initial member and vice versa.

When doing hardware-related programming, make sure to never use signed types. That means changing intn_t to uintn_t.

As for how to type pun properly, use a union:

typedef union
{
  struct                 /* standard C anonymous struct */
  {
    uint8_t b;
    uint8_t c;
  };
  uint16_t bc;
} reg_t;

You can then assign this to point at a 16 bit hardware register like this:

volatile reg_t* reg = (volatile reg_t*)0x1234;

where 0x1234 is the hardware register address.

NOTE: this union is endianess-dependent. b will access the MS byte of bc on big endian systems, but the LS byte of bc on little endian systems.

回答2:

To emulate a hardware register that can be accessed as two eight-bit registers or one 16-bit register, you can use:

union
{
    struct { int8_t b, c; };
    int16_t bc;
} regs[1];

Then regs->bc will be the 16-bit register, and regs->b and regs->c will be 8-bit registers.

Note: This uses an anonymous struct so that b and c appears as if they were members of the union. If the struct had a name, like this:

union
{
    struct { int8_t b, c; } s;
    int16_t bc;
} regs[1];

then you would have to include its name when accessing b or c, as with regs->s.b. However, C has a feature that allows you to use a declaration without a name for this purpose.

Also note this requires a C compiler. C allows using unions to reinterpret data. C++ has different rules.

回答3:

The correct way is through anonymous unions in C as already shown in other answers. But as you want to process bytes, you may use the special handling of characters in the strict aliasing rule: whatever the type, is is always legal to use a char pointer to access the bytes of its representation. So this is conformant C

struct {
    uint16_t bc;
    uint8_t *b;
    uint8_t *c;
} regs[1];

regs->b = (uint8_t *) &(regs->bc);
regs->c = regs->b + 1

Interestingly enough, it is still valid for a C++ compiler...

回答4:

The correct way to type-pun in C (or do almost anything, for that matter), is to use an implementation that is configured to be suitable for one's intended purpose. The Standard deliberately allows implementations that are intended for various purposes to behave in ways that would make them unsuitable for other purposes. According to the authors, it was never intended to suggest that programs whose behavior isn't mandated by the Standard (but would be defined on the implementations for which they were intended) should be viewed as "broken". Compilers whose authors seek to support the needs of their customers will recognize straightforward type-punning constructs whether or not the Standard requires them to do so, and optimizers who authors view their customers' needs with contempt should not be trusted to reliably handle anything complicated.

来源：https://stackoverflow.com/questions/55419777/how-to-type-pun-in-c

标签

emulation

type-punning

z80