I\'ve got some code, originally given to me by someone working with MSVC, and I\'m trying to get it to work on Clang. Here\'s the function that I\'m having trouble with:
A union is probably the most portable way to do this:
union {
__m128 v; // SSE 4 x float vector
float a[4]; // scalar array of 4 floats
} U;
float vectorGetByIndex(__m128 V, unsigned int i)
{
U u;
assert(i <= 3);
u.v = V;
return u.a[i];
}
As a modification to hirschhornsalz's solution, if i
is a compile-time constant, you could avoid the union path entirely by using a shuffle/store:
template<unsigned i>
float vectorGetByIndex( __m128 V)
{
#ifdef __SSE4_1__
return _mm_extract_epi32(V, i);
#else
float ret;
// shuffle V so that the element that you want is moved to the least-
// significant element of the vector (V[0])
V = _mm_shuffle_ps(V, V, _MM_SHUFFLE(i, i, i, i));
// return the value in V[0]
return _mm_cvtss_f32(V);
#endif
}
Use
template<unsigned i>
float vectorGetByIndex( __m128 V) {
union {
__m128 v;
float a[4];
} converter;
converter.v = V;
return converter.a[i];
}
which will work regardless of the available instruction set.
Note: Even if SSE4.1 is available and i
is a compile time constant, you can't use pextract
etc. this way, because these instructions extract a 32-bit integer, not a float
:
// broken code starts here
template<unsigned i>
float vectorGetByIndex( __m128 V) {
return _mm_extract_epi32(V, i);
}
// broken code ends here
I don't delete it because it is a useful reminder how to not do things.
The way I use is
union vec { __m128 sse, float f[4] };
float accessmember(__m128 v, int index)
{
vec v.sse = v;
return v.f[index];
}
Seems to work out pretty well for me.