In my program I have a function that does a simple vector addition c[0:15] = a[0:15] + b[0:15]
. The function prototype is:
void vecadd(float * r
Alignment specifications usually only work for alignments that are smaller than the base type of a pointer, not larger.
I think easiest is to declare your whole array with an alignment specification, something like
typedef float myvector[16];
typedef myvector alignedVector __attribute__((aligned (8));
(The syntax might not be correct, I always have difficulties to know where to put these __attribute__
s)
And use that type throughout your code. For your function definition I'd try
void vecadd(alignedVector * restrict a, alignedVector * restrict b, alignedVector * restrict c);
This gives you an additional indirection but this is only syntax. Something like *a
is just a noop and only reinterprets the pointer as a pointer to the first element.
If the attributes don't work, or aren't an option ....
I'm not sure, but try this:
void vecadd (float * restrict a, float * restrict b, float * restrict c)
{
a = __builtin_assume_aligned (a, 8);
b = __builtin_assume_aligned (b, 8);
c = __builtin_assume_aligned (c, 8);
for ....
That should tell GCC that the pointers are aligned. From that whether it does what you want depends on whether the compiler can use that information effectively; it might not be smart enough: these optimizations aren't easy.
Another option might be to wrap the float inside a union that must be 8-byte aligned:
typedef union {
float f;
long long dummy;
} aligned_float;
void vedadd (aligned_float * a, ......
I think that should enforce 8-byte alignment, but again, I don't know if the compiler is smart enough to use it.
How to tell GCC that a pointer argument is always double-word-aligned?
It looks like newer versions of GCC have __builtin_assume_aligned:
Built-in Function:
void * __builtin_assume_aligned (const void *exp, size_t align, ...)
This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned. This built-in can have either two or three arguments, if it has three, the third argument should have integer type, and if it is nonzero means misalignment offset. For example:
void *x = __builtin_assume_aligned (arg, 16);
means that the compiler can assume x, set to arg, is at least 16-byte aligned, while:
void *x = __builtin_assume_aligned (arg, 32, 8);
means that the compiler can assume for x, set to arg, that (char *) x - 8 is 32-byte aligned.
Based on some other questions and answers on Stack Overflow circa 2010, it appears the built-in was not available in GCC 3 and early GCC 4. But I do not know where the cut-off point is.
gcc versions have been dodgy about align() on simple typedefs and arrays. Typically to do what you want, you would have to wrap the float in a struct, and have the contained float have the alignment restriction.
With operator overloading you can almost make this painless, but it does assume you can use c++ syntax.
#include <stdio.h>
#include <string.h>
#define restrict __restrict__
typedef float oldfloat8 __attribute__ ((aligned(8)));
struct float8
{
float f __attribute__ ((aligned(8)));
float8 &operator=(float _f) { f = _f; return *this; }
float8 &operator=(double _f) { f = _f; return *this; }
float8 &operator=(int _f) { f = _f; return *this; }
operator float() { return f; }
};
int Myfunc(float8 * restrict a, float8 * restrict b, float8 * restrict c);
int MyFunc(float8 * restrict a, float8 * restrict b, float8 * restrict c)
{
return *c = *a* *b;
}
int main(int argc, char **argv)
{
float8 a, b, c;
float8 p[4];
printf("sizeof(oldfloat8) == %d\n", (int)sizeof(oldfloat8));
printf("sizeof(float8) == %d\n", (int)sizeof(float8));
printf("addr p[0] == %p\n", &p[0] );
printf("addr p[1] == %p\n", &p[1] );
a = 2.0;
b = 7.0;
MyFunc( &a, &b, &c );
return 0;
}
I never used it, but there is _attribute_((aligned (8)))
If I read the documentation right, then it is used this way:
void vecadd(float * restrict a __attribute__((aligned (8))),
float * restrict b __attribute__((aligned (8))),
float * restrict c __attribute__((aligned (8))));
see http://ohse.de/uwe/articles/gcc-attributes.html#type-aligned
Following a piece of example code I've found on my system, I tried the following solution, which incorporate ideas from a few of the answers given earlier: basically, create a union of a small array of floats with a 64-bit type - in this case a SIMD vector of floats - and call the function with a cast of the operand float arrays:
typedef float f2 __attribute__((vector_size(8)));
typedef union { f2 v; float f[2]; } simdfu;
void vecadd(f2 * restrict a, f2 * restrict b, f2 * restrict c);
float a[16] __attribute__((aligned(8)));
float b[16] __attribute__((aligned(8)));
float c[16] __attribute__((aligned(8)));
int main()
{
vecadd((f2 *) a, (f2 *) b, (f2 *) c);
return 0;
}
Now the compiler does not generate the 4-aligned branch.
However, the __builtin_assume_aligned()
would be the preferable solution, preventing the cast and possible side effects, if it only worked...
EDIT: I noticed that the builtin function is actually buggy on our implementation (i.e, not only it doesn't work, but it causes calculation errors later in the code.