Does rewriting memcpy/memcmp/... with SIMD instructions make sense in a large scale software?
If so, why doesn\'t GCC generate SIMD instructions for these library functi
Yes, these functions are much faster with SSE instructions. It would be nice if your runtime library/compiler instrinsics would include optimized versions, but that doesn't seem to be pervasive.
I have a custom SIMD memchr
which is a hell-of-a-lot faster than the library version. Especially when I'm finding the first of 2 or 3 characters (example, I want to know if there's an equation in this line of text, I search for the first of =
, \n
, \r
).
On the other hand, the library functions are well tested, so it's only worth writing your own if you call them a lot and a profiler shows they're a significant fraction of your CPU time.