How do I write a portable GNU C builtin vectors version of this, which doesn\'t depend on the x86 set1 intrinsic?
typedef uint16_t v8su __attribute__((vector
A generic broadcast solution can be found for GCC and Clang using two observations
scalar - vector
operations. x - 0 = x
(but x + 0 does not work due to signed zero). Here is a solution for a vector of four floats.
#if defined (__clang__)
typedef float v4sf __attribute__((ext_vector_type(4)));
#else
typedef float v4sf __attribute__ ((vector_size (16)));
#endif
v4sf broadcast4f(float x) {
return x - (v4sf){};
}
https://godbolt.org/g/PXr3Xb
The same generic solution can be used for different vectors. Here is an example for a vector of eight unsigned shorts.
#if defined (__clang__)
typedef unsigned short v8su __attribute__((ext_vector_type(8)));
#else
typedef unsigned short v8su __attribute__((vector_size(16)));
#endif
v8su broadcast8us(short x) {
return x - (v8su){};
}
ICC (17) supports a subset of the GCC vector extensions but does not support either vector + scalar
or vector*scalar
yet so intrinsics are still necessary for broadcasts. MSVC does not support any vector
extensions.