I have been trying experiment with improving performance of strcmp
under certain conditions. However, I unfortunately cannot even get an implementation of plain van
I don't know which standard library you have, but just to give you an idea of how serious C library maintainers are about optimizing the string primitives, the default strcmp used by GNU libc on x86-64 is two thousand lines of hand-optimized assembly language, as of version 2.24. There are separate, also hand-optimized, versions for when the SSSE3 and SSE4.2 instruction set extensions are available. (A fair bit of the complexity in that file appears to be because the same source code is used to generate several other functions; the machine code winds up being "only" 1120 instructions.) 2.24 was released roughly a year ago, and even more work has gone into it since.
They go to this much trouble because it's common for one of the string primitives to be the single hottest function in a profile.