Bit hack: Expanding bits

前端 未结 8 1506
名媛妹妹
名媛妹妹 2021-02-13 21:27

I am trying to convert a uint16_t input to a uint32_t bit mask. One bit in the input toggles two bits in the output bit mask. Here is an example conver

8条回答
  •  -上瘾入骨i
    2021-02-13 22:00

    If your concern is performance and simplicity, you are likely best of with a big lookup table (64k entries of 4 bytes each). With that, you can pretty much use any algorithm you like to generate the table, lookup will just be a single memory access.

    If that table is too big for your liking, you can split it. For instance, you can use a 8 bit lookup table with 256 entries of 2 bytes each. With that you can perform the entire operation with just two lookups. Bonus is, that this approach allows for type-punning tricks to avoid the hassle of splitting the address with bit operations:

    //Implementation defined behavior ahead:
    //Works correctly for both little and big endian machines,
    //however, results will be wrong on a PDP11...
    uint32_t getMask(uint16_t input) {
        assert(sizeof(uint16_t) == 2);
        assert(sizeof(uint32_t) == 4);
        static const uint16_t lookupTable[256] = { 0x0000, 0x0003, 0x000c, 0x000f, ... };
    
        unsigned char* inputBytes = (unsigned char*)&input;    //legal because we type-pun to char, but the order of the bytes is implementation defined
        char outputBytes[4];
        uint16_t* outputShorts = (uint16_t*)outputBytes;    //legal because we type-pun from char, but the order of the shorts is implementation defined
        outputShorts[0] = lookupTable[inputBytes[0]];
        outputShorts[1] = lookupTable[inputBytes[1]];
        uint32_t output;
        memcpy(&output, outputBytes, 4);    //can't type-pun directly from uint16 to uint32_t due to strict aliasing rules
        return output;
    }
    

    The code above works around strict aliasing rules by casting only to/from char, which is an explicit exception to the strict aliasing rules. It also works around the effects of little/big-endian byte order by building the result in the same order as the input was split. However, it still exposes implementation defined behavior: A machine with a byte order of 1, 0, 3, 2, or other middle endian orders, will silently produce wrong results (there have actually been such CPUs like the PDP11...).

    Of course, you can split the lookup table even further, but I doubt that would do you any good.

提交回复
热议问题