Find min/max value from a __m128i

后端 未结 2 786
抹茶落季
抹茶落季 2021-02-10 05:03

I want to find the minimum/maximum value into an array of byte using SIMD operations. So far I was able to go through the array and store the minimum/maximum value into a __m128

相关标签:
2条回答
  • 2021-02-10 05:16

    Here is an example for horizontal max for uint8_t:

    #include "tmmintrin.h" // requires SSSE3
    
    __m128i _mm_hmax_epu8(const __m128i v)
    {
        __m128i vmax = v;
    
        vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 1));
        vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 2));
        vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 4));
        vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 8));
    
        return vmax;
    }
    

    The max value will be returned in all elements. If you need the value as a scalar then use _mm_extract_epi8.

    It should be fairly obvious how to adapt this for min, and for signed min/max.

    0 讨论(0)
  • 2021-02-10 05:25

    Alternatively, convert to words and use phminposuw (not tested)

    int hminu8(__m128i x)
    {
      __m128i l = _mm_unpacklo_epi8(x, _mm_setzero_si128());
      __m128i h = _mm_unpackhi_epi8(x, _mm_setzero_si128());
      l = _mm_minpos_epu16(l);
      h = _mm_minpos_epu16(h);
      return _mm_extract_epi16(_mm_min_epu16(l, h), 0);
    }
    

    By my quick count, the latency is a bit worse than a min/shuffle cascade, but the throughput a bit better. The linked answer with phminposuw is probably better though. Adapted for unsigned bytes (but not tested)

    uint8_t hminu8(__m128i x)
    {
      x = _mm_min_epu8(x, _mm_srli_epi16(x, 8));
      x = _mm_minpos_epu16(x);
      return _mm_cvtsi128_si32(x);
    }
    

    You could use it for max too, but with a bit of overhead: complement the input and result.

    0 讨论(0)
提交回复
热议问题