Find min/max value from a __m128i

假如想象 提交于 2021-01-20 20:18:26

问题


I want to find the minimum/maximum value into an array of byte using SIMD operations. So far I was able to go through the array and store the minimum/maximum value into a __m128i variable, but it means that the value I am looking for is mixed among others (15 others to be exact).

I've found these discussions here and here for integer, and this page for float, but I don't understand how works _mm_shuffle*. So my questions are:

  1. What SIMD operations do I have to perform in order to extract the minimum / maximum byte (or unsigned byte) value from the __m128i variable?
  2. How does _mm_shuffle* work? I don't get it when I look to the "minimal" documentation online. I know it is related to the _MM_SHUFFLE macro, but I don't get the example.

回答1:


Here is an example for horizontal max for uint8_t:

#include "tmmintrin.h" // requires SSSE3

__m128i _mm_hmax_epu8(const __m128i v)
{
    __m128i vmax = v;

    vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 1));
    vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 2));
    vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 4));
    vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 8));

    return vmax;
}

The max value will be returned in all elements. If you need the value as a scalar then use _mm_extract_epi8.

It should be fairly obvious how to adapt this for min, and for signed min/max.




回答2:


Alternatively, convert to words and use phminposuw (not tested)

int hminu8(__m128i x)
{
  __m128i l = _mm_unpacklo_epi8(x, _mm_setzero_si128());
  __m128i h = _mm_unpackhi_epi8(x, _mm_setzero_si128());
  l = _mm_minpos_epu16(l);
  h = _mm_minpos_epu16(h);
  return _mm_extract_epi16(_mm_min_epu16(l, h), 0);
}

By my quick count, the latency is a bit worse than a min/shuffle cascade, but the throughput a bit better. The linked answer with phminposuw is probably better though. Adapted for unsigned bytes (but not tested)

uint8_t hminu8(__m128i x)
{
  x = _mm_min_epu8(x, _mm_srli_epi16(x, 8));
  x = _mm_minpos_epu16(x);
  return _mm_cvtsi128_si32(x);
}

You could use it for max too, but with a bit of overhead: complement the input and result.



来源:https://stackoverflow.com/questions/40985572/find-min-max-value-from-a-m128i

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!