why is strchr twice as fast as my simd code

匿名 (未验证) 提交于 2019-12-03 02:38:01

问题:

I am learning SIMD and was curious to see whether it was possible to beat strchr at finding a character. It appears that strchr uses the same intrinsics but I assume that it checks for a null, whereas I know the character is in the array and plan on avoiding a null check.

My code is:

size_t N = 1e9; bool found = false; //Not really used ... size_t char_index1 = 0; size_t char_index2 = 0; char * str = malloc(N); memset(str,'a',N);  __m256i char_match; __m256i str_simd; __m256i result; __m256i* pSrc1;  int simd_mask;  str[(size_t)5e8] = 'b';       char_match = _mm256_set1_epi8('b');     result = _mm256_set1_epi32(0);      simd_mask = 0;      pSrc1 = (__m256i *)str;      while (1){         str_simd  = _mm256_lddqu_si256(pSrc1);         result = _mm256_cmpeq_epi8(str_simd, char_match);         simd_mask = _mm256_movemask_epi8(result);            if (simd_mask != 0){             break;         }         pSrc1++;     } 

Full (not yet finished code) at: https://gist.github.com/JimHokanson/433e185ba53b41e49ce3ac804568ac1e

strchr is twice as fast as this code (using gcc and xcode). I'm hoping to understand why.

Update: compiling using: gcc -std=c11 -mavx2 -mlzcnt

回答1:

I had not set an optimization flag in the compiler. Setting -O3 resulted in the SIMD code only taking 75% of the time of strchr.

Update: I should also clarify this is not a final working version of the code. There are still additional checks that need to be put in place and possible ways of optimizing the calls (I think). At least at this point though the code is in the ballpark of strchr. As pointed out in the question comments this version could read past a page and fault. Finally, this is mostly a SIMD learning opportunity (for myself), and memchr is probably your best bet (although I suspect you might be able to just slightly beat memchr if you have a sentinel buffer).



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!