I try to build an application which uses pthreads and __m128 SSE type. According to GCC manual, default stack alignment is 16 bytes. In order to use __m128, the requirement is t
Allocate on the stack an array that is 15-bytes larger than sizeof(__m128)
, and use the first aligned address in that array. If you need several, allocate them in an array with a single 15-byte margin for alignment.
I do not remember if allocating an unsigned char
array makes you safe from strict aliasing optimizations by the compiler or if it only works only the other way round.
#include
void *f(void *x)
{
unsigned char y[sizeof(__m128)+15];
__m128 *py = (__m128*) (((uintptr_t)&y) + 15) & ~(uintptr_t)15);
...
}