Why is memcpy slower than a reinterpret_cast when parsing binary data?

前端未结

关注

 3  1739

迷失自我

TLDR: I forgot to enable compiler optimizations. With the optimizations enabled the performance is (nearly) identical.

Original post

When r

相关标签:

3条回答

说谎

2020-12-31 12:53

Casting is a compile-time operation while memcpy() is a run-time operation. That's the reason for casting having no impact on the running time.

0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-12-31 12:54

You need to look at the emitted code. Obviously the optimizer "should" be able to turn the memcpy into a single potentially-unaligned int-sized read into the return value, but if you see different times then I reckon on x86 that means it hasn't.

On my machine, using gcc with -O2 I get 0.09 for all times. With -O3 I get 0 for all times (I haven't checked whether that's faster than the time granularity, or that the optimizer has removed all your code).

So fairly likely, the answer is just that you haven't used the right compiler flags (or ideone hasn't).

On an architecture where a potentially-unaligned read requires different instructions from an aligned read, then the reinterpret_cast could emit an aligned read while the memcpy might have to emit an unaligned read (depending how the function is called -- in this case the data is in fact aligned but I don't know under what conditions the compiler can prove that). In that case I would expect that the reinterpret_cast code could be faster than the memcpy, but of course it would be incorrect in the case where someone passes in an unaligned pointer.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-12-31 12:56

memcpy cannot copy to a register, it does a memory-to-memory copy. The reinterpret_cast in get_int_v1 can change the type of pointer held in a register, and that doesn't even require a register-to-register copy.

0 讨论(0)
发布评论:

提交评论
- 加载中...