Does rewriting memcpy/memcmp/... with SIMD instructions make sense in a large scale software?
If so, why doesn\'t GCC generate SIMD instructions for these library functi
on x86 hardware, it should not matter much, with out-of-order processing. Processor will achieve necessary ILP and try to issue max number of load/store operations per cycle for memcpy, whether it be SIMD or Scalar instruction set.