I am learning SIMD and was curious to see whether it was possible to beat strchr at finding a character. It appears that strchr uses the same intrinsics but I assume that it checks for a null, whereas I know the character is in the array and plan on avoiding a null check.
My code is:
size_t N = 1e9; bool found = false; //Not really used ... size_t char_index1 = 0; size_t char_index2 = 0; char * str = malloc(N); memset(str,'a',N); __m256i char_match; __m256i str_simd; __m256i result; __m256i* pSrc1; int simd_mask; str[(size_t)5e8] = 'b'; char_match = _mm256_set1_epi8('b'); result = _mm256_set1_epi32(0); simd_mask = 0; pSrc1 = (__m256i *)str; while (1){ str_simd = _mm256_lddqu_si256(pSrc1); result = _mm256_cmpeq_epi8(str_simd, char_match); simd_mask = _mm256_movemask_epi8(result); if (simd_mask != 0){ break; } pSrc1++; }
Full (not yet finished code) at: https://gist.github.com/JimHokanson/433e185ba53b41e49ce3ac804568ac1e
strchr is twice as fast as this code (using gcc and xcode). I'm hoping to understand why.
Update: compiling using: gcc -std=c11 -mavx2 -mlzcnt