This is a follow-up to this question where I posted this program:
#include
#include
#include
#include
Looks to me like the answer is that gcc can optimize these particular calls to memmove and memcpy, but not std::copy. gcc is aware of the semantics of memmove and memcpy, and in this case can take advantage of the fact that the size is known (sizeof(int)) to turn the call into a single mov instruction.
std::copy is implemented in terms of memcpy, but apparently the gcc optimizer doesn't manage to figure out that data + sizeof(int) - data is exactly sizeof(int). So the benchmark calls memcpy.
I got all that by invoking gcc with -S
and flipping quickly through the output; I could easily have gotten it wrong, but what I saw seems consistent with your measurements.
By the way, I think the test is more or less meaningless. A more plausible real-world test might be creating an actual vector
and an int[N] dst
, and then comparing memcpy(dst, src.data(), sizeof(int)*src.size())
with std::copy(src.begin(), src.end(), &dst)
.