How to find the horizontal maximum in a 256-bit AVX vector

前端 未结 3 1355
再見小時候
再見小時候 2020-12-06 10:03

I have a __m256d vector packed with four 64-bit floating-point values.
I need to find the horizontal maximum of the vector\'s elements and store the result in a double-p

相关标签:
3条回答
  • 2020-12-06 10:48
    //Use the code to find the horizontal maximum
    __m256 v1 = initial_vector;//example v1=[1 2 3 4 5 6 7 8]
    __m256 v2 = _mm256_permute_ps(v1,(int)147);//147 is control code for rotate left by upper 4 elements and lower 4 elements separately v2=[2 3 4 1 6 7 8 5]
    __m256 v3 = _mm256_max_ps(v1,v2);//v3=[2 3 4 4 6 7 8 8]
    __m256 v4 = _mm256_permute_ps(v3,(int)147);//v4=[3 4 4 2 7 8 8 6]
    __m256 v5 = _mm256_max_ps(v3,v4);//v5=[3 4 4 4 7 8 8 8]
    __m256 v6 = _mm256_permute_ps(v5,(int)147);//v6=[4 4 4 3 8 8 8 7]
    __m256 v7 = _mm256_max_ps(v5,v6);//contains max of upper four elements and lower 4 elements. v7=[4 4 4 4 8 8 8 8]
    
    //to get max of this horizontal array. Note that the highest end of either upper or lower can contain the maximum
    float ALIGN max_array[8];
    float horizontal_max;
    _mm256_store_ps(max_array, v7);
    if(max_array[3] > max_array[7])
    {
        horizontal_max = max_array[3];
    }
    else
    {
        horizontal_max = max_array[7];
    }
    
    0 讨论(0)
  • 2020-12-06 10:53

    I don't think you can do much better than 4 instructions: 2 shuffles and 2 comparisons.

    __m256d x = ...; // input
    
    __m128d y = _mm256_extractf128_pd(x, 1); // extract x[2], and x[3]
    __m128d m1 = _mm_max_pd(x, y); // m1[0] = max(x[0], x[2]), m1[1] = max(x[1], x[3])
    __m128d m2 = _mm_permute_pd(m1, 1); // set m2[0] = m1[1], m2[1] = m1[0]
    __m128d m = _mm_max_pd(m1, m2); // both m[0] and m[1] contain the horizontal max(x[0], x[1], x[2], x[3])
    

    Trivial modification to only work with 256-bit vectors:

    __m256d x = ...; // input
    
    __m256d y = _mm256_permute2f128_pd(x, x, 1); // permute 128-bit values
    __m256d m1 = _mm256_max_pd(x, y); // m1[0] = max(x[0], x[2]), m1[1] = max(x[1], x[3]), etc.
    __m256d m2 = _mm256_permute_pd(m1, 5); // set m2[0] = m1[1], m2[1] = m1[0], etc.
    __m256d m = _mm256_max_pd(m1, m2); // all m[0] ... m[3] contain the horizontal max(x[0], x[1], x[2], x[3])
    

    (untested)

    0 讨论(0)
  • 2020-12-06 10:54

    The general way of doing this for a vector v1 = [A, B, C, D] is

    1. Permute v1 to v2 = [C, D, A, B] (swap 0th and 2nd elements, and 1st and 3rd ones)
    2. Take the max; i.e. v3 = max(v1,v2). You now have [max(A,C), max(B,D), max(A,C), max(B,D)]
    3. Permute v3 to v4, swapping the 0th and 1st elements, and the 2nd and 3rd ones.
    4. Take the max again, i.e. v5 = max(v3,v4). Now v5 contains the horizontal max in all of its components.

    Specifically for AVX, the permutations can be done with _mm256_permute_pd and the maximums can be done with _mm256_max_pd. I don't have the exact permute masks handy but they should be pretty straightforward to figure out.

    Hope that helps.

    0 讨论(0)
提交回复
热议问题