When r
Casting is a compile-time operation while memcpy()
is a run-time operation. That's the reason for casting having no impact on the running time.
You need to look at the emitted code. Obviously the optimizer "should" be able to turn the memcpy
into a single potentially-unaligned int
-sized read into the return value, but if you see different times then I reckon on x86 that means it hasn't.
On my machine, using gcc with -O2
I get 0.09 for all times. With -O3
I get 0 for all times (I haven't checked whether that's faster than the time granularity, or that the optimizer has removed all your code).
So fairly likely, the answer is just that you haven't used the right compiler flags (or ideone hasn't).
On an architecture where a potentially-unaligned read requires different instructions from an aligned read, then the reinterpret_cast
could emit an aligned read while the memcpy
might have to emit an unaligned read (depending how the function is called -- in this case the data is in fact aligned but I don't know under what conditions the compiler can prove that). In that case I would expect that the reinterpret_cast
code could be faster than the memcpy
, but of course it would be incorrect in the case where someone passes in an unaligned pointer.
memcpy
cannot copy to a register, it does a memory-to-memory copy. The reinterpret_cast
in get_int_v1
can change the type of pointer held in a register, and that doesn't even require a register-to-register copy.