Background: I\'m trying to create a pure D language implementation of functionality that\'s roughly equivalent to C\'s memchr but uses arrays and indices i
I would suggest taking a look at GNU libc's source. As for most functions, it will contain both a generic optimized C version of the function, and optimized assembly language versions for as many supported architectures as possible, taking advantage of machine specific tricks.
The x86-64 SSE2 version combines the results from pcmpeqb on a whole cache-line of data at once (four 16B vectors), to amortize the overhead of the early-exit pmovmskb
/test
/jcc
.
gcc and clang are currently incapable of auto-vectorizing loops with if() break
early-exit conditions, so they make naive byte-at-a-time asm from the obvious C implementation.