Does rewriting memcpy/memcmp/... with SIMD instructions make sense in a large scale software?
If so, why doesn\'t GCC generate SIMD instructions for these library functi
It probably doesn't matter. The CPU is much faster than memory bandwidth, and the implementations of memcpy
etc. provided by the compiler's runtime library are probably good enough. In "large scale" software your performance is not going to be dominated by copying memory, anyway (it's probably dominated by I/O).
To get a real step up in memory copying performance, some systems have a specialised implementation of DMA that can be used to copy from memory to memory. If a substantial performance increase is needed, hardware is the way to get it.