This is a follow-up to this question where I posted this program:
#include
#include
#include
#include
According to assembler output of G++ 4.8.1, test_memcpy
:
movl (%r15), %r15d
test_std_copy
:
movl $4, %edx
movq %r15, %rsi
leaq 16(%rsp), %rdi
call memcpy
As you can see, std::copy
successfully recognized that it can copy data with memcpy
, but for some reason further inlining did not happen - so that is the reason of performance difference.
By the way, Clang 3.4 produces identical code for both cases:
movl (%r14,%rbx), %ebp