I understand that memmove
and memcpy
difference is that memmove
handles the memory overlap case. I have checked the implementation in libgcc and got this article [memcpy performance] from the intel website.
In libgcc, the memmove
is similar to memcpy
, both just go though one byte and byte, so the performance should be almost same even after optimization.
Someone has measured this and got this article memcopy, memmove, and Speed over Safety. Even I don't think the memmove
can be faster than memcpy
, but there should be no big difference at least on Intel
platform.
So in what platform and how, memcpy
can be significantly faster than memmove
, if there is none, why providing two similiar functions instead of just memmove
, and lead to a lots of bug.
Edit: I'm not asking the difference of memmove and memcpy, I know memmove can handle overlap issue. The question is about is there really any platform where memcpy is faster than memmove?
There is at least one recent case where the constraint of non-overlapping memory is used to generate faster code:
In Visual Studio memcpy
can be compiled using intrinsics, while memmove
cannot. This leads in memcpy
being much faster for small regions of a known size because of removing the function call and setup overhead. The implementation using movsd
/movsw
/movsb
is not suitable for overlapping blocks, as it starts copying at the lowest address, incrementing the edi/esi during the copy.
See also Make compiler copy characters using movsd.
The GCC also lists memcpy as implemented as built-ins, the implementation and motivation is likely to be similar to that of Visual Studio.
Good practice: In general, USE memmove only if you have to. USE it when there is a very reasonable chance that the source and destination regions are over-lapping.
Otherwise USE memcpy. memcpy is more efficient.
Reference: https://www.youtube.com/watch?v=Yr1YnOVG-4g Dr. Jerry Cain, (Stanford Intro Systems Lecture - 7) Time: 36:00