Data type compatibility with NEON intrinsics

后端 未结 5 978
轻奢々
轻奢々 2021-01-03 05:26

I am working on ARM optimizations using the NEON intrinsics, from C++ code. I understand and master most of the typing issues, but I am stuck on this one:

The instru

相关标签:
5条回答
  • 2021-01-03 05:31

    I was facing the same kind of problem, so I introduced a flexible data type.

    I can now therefore define the following:

    typedef NeonVectorType<uint8x16_t> uint_128bit_t; //suitable for uint8x16_t, uint8x8x2_t, uint32x4_t, etc.
    typedef NeonVectorType<uint8x8_t> uint_64bit_t; //suitable for uint8x8_t, uint32x2_t, etc.
    
    0 讨论(0)
  • 2021-01-03 05:43

    Its a bug in GCC(now fixed) on 4.5 and 4.6 series.

    Bugzilla link http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48252

    Please take the fix from this bug and apply to gcc source and rebuild it.

    0 讨论(0)
  • 2021-01-03 05:52

    You can construct a 128 bit vector from two 64 bit vectors using the vcombine_* intrinsics. Thus, you can achieve what you want like this.

    #include <arm_neon.h>
    
    uint8x16_t f(uint8x8_t a, uint8x8_t b)
    {
        uint8x8x2_t tmp = vzip_u8(a,b);
        uint8x16_t result;
        result = vcombine_u8(tmp.val[0], tmp.val[1]);
        return result;
    }
    
    0 讨论(0)
  • 2021-01-03 05:54

    I have found a workaround: given that the val member of the uint8x8x2_t type is an array, it is therefore seen as a pointer. Casting and deferencing the pointer works ! [Whereas taking the address of the data raises an "address of temporary" warning.]

    uint16x8_t Value= *(uint16x8_t*)vzip_u8(arg0, arg1).val;
    

    It turns out that this compiles and executes as should (at least in the case I have tried). I haven't looked at the assembly code so I cannot grant it is implemented properly (I mean just keeping the value in a register instead of writing/read to/from memory.)

    0 讨论(0)
  • 2021-01-03 05:56

    Some definitions to answer clearly...

    NEON has 32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide).

    The NEON unit can view the same register bank as:

    • sixteen 128-bit quadword registers, Q0-Q15
    • thirty-two 64-bit doubleword registers, D0-D31.

    uint16x8_t is a type which requires 128-bit storage thus it needs to be in an quadword register.

    ARM NEON Intrinsics has a definition called vector array data type in ARM® C Language Extensions:

    ... for use in load and store operations, in table-lookup operations, and as the result type of operations that return a pair of vectors.

    vzip instruction

    ... interleaves the elements of two vectors.

    vzip Dd, Dm

    and has an intrinsic like

    uint8x8x2_t vzip_u8 (uint8x8_t, uint8x8_t) 
    

    from these we can conclude that uint8x8x2_t is actually a list of two random numbered doubleword registers, because vzip instructions doesn't have any requirement on order of input registers.

    Now the answer is...

    uint8x8x2_t can contain non-consecutive two dualword registers while uint16x8_t is a data structure consisting of two consecutive dualword registers which first one has an even index (D0-D31 -> Q0-Q15).

    Because of this you can't cast vector array data type with two double word registers to a quadword register... easily.

    Compiler may be smart enough to assist you, or you can just force conversion however I would check the resulting assembly for correctness as well as performance.

    0 讨论(0)
提交回复
热议问题