Why is memcpy slower than a reinterpret_cast when parsing binary data?

前端 未结 3 1739
迷失自我
迷失自我 2020-12-31 12:25

TLDR: I forgot to enable compiler optimizations. With the optimizations enabled the performance is (nearly) identical.


Original post

When r

相关标签:
3条回答
  • 2020-12-31 12:53

    Casting is a compile-time operation while memcpy() is a run-time operation. That's the reason for casting having no impact on the running time.

    0 讨论(0)
  • 2020-12-31 12:54

    You need to look at the emitted code. Obviously the optimizer "should" be able to turn the memcpy into a single potentially-unaligned int-sized read into the return value, but if you see different times then I reckon on x86 that means it hasn't.

    On my machine, using gcc with -O2 I get 0.09 for all times. With -O3 I get 0 for all times (I haven't checked whether that's faster than the time granularity, or that the optimizer has removed all your code).

    So fairly likely, the answer is just that you haven't used the right compiler flags (or ideone hasn't).

    On an architecture where a potentially-unaligned read requires different instructions from an aligned read, then the reinterpret_cast could emit an aligned read while the memcpy might have to emit an unaligned read (depending how the function is called -- in this case the data is in fact aligned but I don't know under what conditions the compiler can prove that). In that case I would expect that the reinterpret_cast code could be faster than the memcpy, but of course it would be incorrect in the case where someone passes in an unaligned pointer.

    0 讨论(0)
  • 2020-12-31 12:56

    memcpy cannot copy to a register, it does a memory-to-memory copy. The reinterpret_cast in get_int_v1 can change the type of pointer held in a register, and that doesn't even require a register-to-register copy.

    0 讨论(0)
提交回复
热议问题