I want to find the minimum/maximum value into an array of byte using SIMD operations. So far I was able to go through the array and store the minimum/maximum value into a __m128
Alternatively, convert to words and use phminposuw
(not tested)
int hminu8(__m128i x)
{
__m128i l = _mm_unpacklo_epi8(x, _mm_setzero_si128());
__m128i h = _mm_unpackhi_epi8(x, _mm_setzero_si128());
l = _mm_minpos_epu16(l);
h = _mm_minpos_epu16(h);
return _mm_extract_epi16(_mm_min_epu16(l, h), 0);
}
By my quick count, the latency is a bit worse than a min/shuffle cascade, but the throughput a bit better. The linked answer with phminposuw
is probably better though. Adapted for unsigned bytes (but not tested)
uint8_t hminu8(__m128i x)
{
x = _mm_min_epu8(x, _mm_srli_epi16(x, 8));
x = _mm_minpos_epu16(x);
return _mm_cvtsi128_si32(x);
}
You could use it for max too, but with a bit of overhead: complement the input and result.