ternary operator for clang's extended vectors

后端 未结 3 1125
孤街浪徒
孤街浪徒 2021-01-24 02:38

I\'ve tried playing with clang\'s extended vectors. The ternary operator is supposed to work, but it is not working for me. Example:

int main()
{
  using int4 =          


        
相关标签:
3条回答
  • 2021-01-24 03:07

    In the end I went with this:

    #if defined(__clang__)
    template <typename U, typename V>
    constexpr inline std::enable_if_t<
      !std::is_arithmetic<V>{},
      V
    >
    select(V const a, V const b, U const c) noexcept
    {
      return V((c & U(a)) | (~c & U(b)));
    }
    #else
    template <typename U, typename V>
    constexpr inline std::enable_if_t<
      !std::is_arithmetic<V>{},
      V
    >
    select(V const a, V const b, U const c) noexcept
    {
      return c ? a : b;
    }
    #endif
    

    The same could have been accomplished in other ways, using the indices trick, for example, but it might not optimize very well (I didn't want any conditionals in there).

    0 讨论(0)
  • 2021-01-24 03:15

    You can loop over the elements directly in Clang. Here is a solution for GCC and Clang.

    #include <inttypes.h>
    #include <x86intrin.h>
    
    #if defined(__clang__)
    typedef float float4 __attribute__ ((ext_vector_type(4)));
    typedef   int   int4 __attribute__ ((ext_vector_type(4)));
    #else
    typedef float float4 __attribute__ ((vector_size (sizeof(float)*4)));
    typedef   int   int4 __attribute__ ((vector_size (sizeof(int)*4)));
    #endif
    
    float4 select(int4 s, float4 a, float4 b) {
      float4 c;
      #if defined(__GNUC__) && !defined(__INTEL_COMPILER) && !defined(__clang__)
      c = s ? a : b;
      #else
      for(int i=0; i<4; i++) c[i] = s[i] ? a[i] : b[i];
      #endif
      return c;
    }
    

    The both generate

    select(int __vector(4), float __vector(4), float __vector(4)):
      pxor xmm3, xmm3
      pcmpeqd xmm0, xmm3
      blendvps xmm1, xmm2, xmm0
      movaps xmm0, xmm1
      ret
    
    • Nehalem: https://godbolt.org/g/cVWYym
    • Skylake: https://godbolt.org/g/LhEpnN
    • KNL: https://godbolt.org/g/NFrFKg

    But with AVX512 it's better to use masks (e.g. __mmask16).

    0 讨论(0)
  • 2021-01-24 03:31

    This works in a pinch:

    auto const diff = a-b;
    auto const ra( - (diff!=zero) * a - (diff==zero) *b);
    

    I guess this is a bug in the compiler, or in the documentation you linked.

    0 讨论(0)
提交回复
热议问题