Opinions on type-punning in C++?

后端 未结 5 430
野趣味
野趣味 2020-12-02 23:42

I\'m curious about conventions for type-punning pointers/arrays in C++. Here\'s the use case I have at the moment:

Compute a simple 32-bit checksum over a
相关标签:
5条回答
  • 2020-12-02 23:56

    As far as the C++ standard is concerned, litb's answer is completely correct and the most portable. Casting const char *data to a const uint3_t *, whether it be via a C-style cast, static_cast, or reinterpret_cast, breaks the strict aliasing rules (see Understanding Strict Aliasing). If you compile with full optimization, there's a good chance that the code will not do the right thing.

    Casting through a union (such as litb's my_reint) is probably the best solution, although it does technically violate the rule that if you write to a union through one member and read it through another, it results in undefined behavior. However, practically all compilers support this, and it results in the the expected result. If you absolutely desire to conform to the standard 100%, go with the bit-shifting method. Otherwise, I'd recommend going with casting through a union, which is likely to give you better performance.

    0 讨论(0)
  • 2020-12-03 00:04

    There are my fifty cents - different ways to do it.

    #include <iostream>
    #include <string>
    #include <cstring>
    
        uint32_t compute_checksum_memcpy(const char *data, size_t size)
        {
            uint32_t checksum = 0;
            for (size_t i = 0; i != size / 4; ++i)
            {
                // memcpy may be slow, unneeded allocation
                uint32_t dest; 
                memcpy(&dest,data+i,4);
                checksum += dest;
            }
            return checksum;
        }
    
        uint32_t compute_checksum_address_recast(const char *data, size_t size)
        {
            uint32_t checksum = 0;
            for (size_t i = 0; i != size / 4; ++i)
            {
                //classic old type punning
                checksum +=  *(uint32_t*)(data+i);
            }
            return checksum;
        }
    
        uint32_t compute_checksum_union(const char *data, size_t size)
        {
            uint32_t checksum = 0;
            for (size_t i = 0; i != size / 4; ++i)
            {
                //Syntax hell
                checksum +=  *((union{const char* c;uint32_t* i;}){.c=data+i}).i;
            }
            return checksum;
        }
    
        // Wrong!
        uint32_t compute_checksum_deref(const char *data, size_t size)
        {
            uint32_t checksum = 0;
            for (size_t i = 0; i != size / 4; ++i)
            {
                checksum +=  *&data[i];
            }
            return checksum;
        }
    
        // Wrong!
        uint32_t compute_checksum_cast(const char *data, size_t size)
        {
            uint32_t checksum = 0;
            for (size_t i = 0; i != size / 4; ++i)
            {
                checksum +=  *(data+i);
            }
            return checksum;
        }
    
    
    int main()
    {
        const char* data = "ABCDEFGH";
        std::cout << compute_checksum_memcpy(data, 8) << " OK\n";
        std::cout << compute_checksum_address_recast(data, 8) << " OK\n";
        std::cout << compute_checksum_union(data, 8) << " OK\n";
        std::cout << compute_checksum_deref(data, 8) << " Fail\n";
        std::cout << compute_checksum_cast(data, 8) << " Fail\n";
    }
    
    0 讨论(0)
  • 2020-12-03 00:09

    Ignoring efficiency, for simplicity of code I'd do:

    #include <numeric>
    #include <vector>
    #include <cstring>
    
    uint32_t compute_checksum(const char *data, size_t size) {
        std::vector<uint32_t> intdata(size/sizeof(uint32_t));
        std::memcpy(&intdata[0], data, size);
        return std::accumulate(intdata.begin(), intdata.end(), 0);
    }
    

    I also like litb's last answer, the one that shifts each char in turn, except that since char might be signed, I think it needs an extra mask:

    checksum += ((data[i] && 0xFF) << shift[i % 4]);
    

    When type punning is a potential issue, I prefer not to type pun rather than to try to do so safely. If you don't create any aliased pointers of distinct types in the first place, then you don't have to worry what the compiler might do with aliases, and neither does the maintenance programmer who sees your multiple static_casts through a union.

    If you don't want to allocate so much extra memory, then:

    uint32_t compute_checksum(const char *data, size_t size) {
        uint32_t total = 0;
        for (size_t i = 0; i < size; i += sizeof(uint32_t)) {
            uint32_t thisone;
            std::memcpy(&thisone, &data[i], sizeof(uint32_t));
            total += thisone;
        }
        return total;
    }
    

    Enough optimisation will get rid of the memcpy and the extra uint32_t variable entirely on gcc, and just read an integer value unaligned, in whatever the most efficient way to do that is on your platform, straight out of the source array. I'd hope the same is true of other "serious" compilers. But this code is now bigger than litb's, so there's not much to be said for it other than mine is easier to turn into a function template that will work just as well with uint64_t, and mine works as native endian-ness rather than picking little-endian.

    This is of course not completely portable. It assumes that the storage representation of sizeof(uint32_t) chars corresponds to the storage representation of a uin32_t in the way we want. This is implied by the question, since it states that one can be "treated as" the other. Endian-ness, whether a char is 8 bits, and whether uint32_t uses all bits in its storage representation can obviously intrude, but the question implies that they won't.

    0 讨论(0)
  • 2020-12-03 00:20

    I know this thread has been inactive for a while, but thought I'd post a simple generic casting routine for this kind of thing:

    // safely cast between types without breaking strict aliasing rules
    template<typename ReturnType, typename OriginalType>
    ReturnType Cast( OriginalType Variable )
    {
        union
        {
            OriginalType    In;
            ReturnType      Out;
        };
    
        In = Variable;
        return Out;
    }
    
    // example usage
    int i = 0x3f800000;
    float f = Cast<float>( i );
    

    Hope it helps someone!

    0 讨论(0)
  • 2020-12-03 00:20

    This looks like a case-book example of when to use reinterpret_cast, anything else will give you the same effect without the explicitness you get from using a language construct for its official use.

    0 讨论(0)
提交回复
热议问题