How to tell GCC that a pointer argument is always double-word-aligned?

前端 未结 6 1632
花落未央
花落未央 2020-12-01 06:37

In my program I have a function that does a simple vector addition c[0:15] = a[0:15] + b[0:15]. The function prototype is:

void vecadd(float * r         


        
相关标签:
6条回答
  • 2020-12-01 06:58

    Alignment specifications usually only work for alignments that are smaller than the base type of a pointer, not larger.

    I think easiest is to declare your whole array with an alignment specification, something like

    typedef float myvector[16];
    typedef myvector alignedVector __attribute__((aligned (8));
    

    (The syntax might not be correct, I always have difficulties to know where to put these __attribute__s)

    And use that type throughout your code. For your function definition I'd try

    void vecadd(alignedVector * restrict a, alignedVector * restrict b, alignedVector * restrict c);
    

    This gives you an additional indirection but this is only syntax. Something like *a is just a noop and only reinterprets the pointer as a pointer to the first element.

    0 讨论(0)
  • 2020-12-01 07:06

    If the attributes don't work, or aren't an option ....

    I'm not sure, but try this:

    void vecadd (float * restrict a, float * restrict b, float * restrict c)
    {
       a = __builtin_assume_aligned (a, 8);
       b = __builtin_assume_aligned (b, 8);
       c = __builtin_assume_aligned (c, 8);
    
       for ....
    

    That should tell GCC that the pointers are aligned. From that whether it does what you want depends on whether the compiler can use that information effectively; it might not be smart enough: these optimizations aren't easy.

    Another option might be to wrap the float inside a union that must be 8-byte aligned:

    typedef union {
      float f;
      long long dummy;
    } aligned_float;
    
    void vedadd (aligned_float * a, ......
    

    I think that should enforce 8-byte alignment, but again, I don't know if the compiler is smart enough to use it.

    0 讨论(0)
  • 2020-12-01 07:11

    How to tell GCC that a pointer argument is always double-word-aligned?

    It looks like newer versions of GCC have __builtin_assume_aligned:

    Built-in Function: void * __builtin_assume_aligned (const void *exp, size_t align, ...)

    This function returns its first argument, and allows the compiler to assume that the returned pointer is at least align bytes aligned. This built-in can have either two or three arguments, if it has three, the third argument should have integer type, and if it is nonzero means misalignment offset. For example:

    void *x = __builtin_assume_aligned (arg, 16);
    

    means that the compiler can assume x, set to arg, is at least 16-byte aligned, while:

    void *x = __builtin_assume_aligned (arg, 32, 8);
    

    means that the compiler can assume for x, set to arg, that (char *) x - 8 is 32-byte aligned.

    Based on some other questions and answers on Stack Overflow circa 2010, it appears the built-in was not available in GCC 3 and early GCC 4. But I do not know where the cut-off point is.

    0 讨论(0)
  • 2020-12-01 07:17

    gcc versions have been dodgy about align() on simple typedefs and arrays. Typically to do what you want, you would have to wrap the float in a struct, and have the contained float have the alignment restriction.

    With operator overloading you can almost make this painless, but it does assume you can use c++ syntax.

    #include <stdio.h>
    #include <string.h>
    
    #define restrict __restrict__
    
    typedef float oldfloat8 __attribute__ ((aligned(8)));
    
    struct float8
    {
        float f __attribute__ ((aligned(8)));
    
        float8 &operator=(float _f) { f = _f; return *this; }
        float8 &operator=(double _f) { f = _f; return *this; }
        float8 &operator=(int _f) { f = _f; return *this; }
    
        operator float() { return f; }
    };
    
    int Myfunc(float8 * restrict a, float8 * restrict b, float8 * restrict c);
    
    int MyFunc(float8 * restrict a, float8 * restrict b, float8 * restrict c)
    {
        return *c = *a* *b;
    }
    
    int main(int argc, char **argv)
    {
        float8 a, b, c;
    
        float8 p[4];
    
        printf("sizeof(oldfloat8) == %d\n", (int)sizeof(oldfloat8));
        printf("sizeof(float8) == %d\n", (int)sizeof(float8));
    
        printf("addr p[0] == %p\n", &p[0] );
        printf("addr p[1] == %p\n", &p[1] );
    
        a = 2.0;
        b = 7.0;
        MyFunc( &a, &b, &c );
        return 0;
    }
    
    0 讨论(0)
  • 2020-12-01 07:18

    I never used it, but there is _attribute_((aligned (8)))

    If I read the documentation right, then it is used this way:

    void vecadd(float * restrict a __attribute__((aligned (8))), 
                float * restrict b __attribute__((aligned (8))), 
                float * restrict c __attribute__((aligned (8))));
    

    see http://ohse.de/uwe/articles/gcc-attributes.html#type-aligned

    0 讨论(0)
  • 2020-12-01 07:21

    Following a piece of example code I've found on my system, I tried the following solution, which incorporate ideas from a few of the answers given earlier: basically, create a union of a small array of floats with a 64-bit type - in this case a SIMD vector of floats - and call the function with a cast of the operand float arrays:

    typedef float f2 __attribute__((vector_size(8)));
    typedef union { f2 v; float f[2]; } simdfu;
    
    void vecadd(f2 * restrict a, f2 * restrict b, f2 * restrict c);
    
    float a[16] __attribute__((aligned(8)));
    float b[16] __attribute__((aligned(8)));
    float c[16] __attribute__((aligned(8)));
    
    int main()
    {
        vecadd((f2 *) a, (f2 *) b, (f2 *) c);
        return 0;
    }
    

    Now the compiler does not generate the 4-aligned branch.

    However, the __builtin_assume_aligned() would be the preferable solution, preventing the cast and possible side effects, if it only worked...

    EDIT: I noticed that the builtin function is actually buggy on our implementation (i.e, not only it doesn't work, but it causes calculation errors later in the code.

    0 讨论(0)
提交回复
热议问题