Data type compatibility with NEON intrinsics

后端 未结 5 977
轻奢々
轻奢々 2021-01-03 05:26

I am working on ARM optimizations using the NEON intrinsics, from C++ code. I understand and master most of the typing issues, but I am stuck on this one:

The instru

5条回答
  •  被撕碎了的回忆
    2021-01-03 05:56

    Some definitions to answer clearly...

    NEON has 32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide).

    The NEON unit can view the same register bank as:

    • sixteen 128-bit quadword registers, Q0-Q15
    • thirty-two 64-bit doubleword registers, D0-D31.

    uint16x8_t is a type which requires 128-bit storage thus it needs to be in an quadword register.

    ARM NEON Intrinsics has a definition called vector array data type in ARM® C Language Extensions:

    ... for use in load and store operations, in table-lookup operations, and as the result type of operations that return a pair of vectors.

    vzip instruction

    ... interleaves the elements of two vectors.

    vzip Dd, Dm

    and has an intrinsic like

    uint8x8x2_t vzip_u8 (uint8x8_t, uint8x8_t) 
    

    from these we can conclude that uint8x8x2_t is actually a list of two random numbered doubleword registers, because vzip instructions doesn't have any requirement on order of input registers.

    Now the answer is...

    uint8x8x2_t can contain non-consecutive two dualword registers while uint16x8_t is a data structure consisting of two consecutive dualword registers which first one has an even index (D0-D31 -> Q0-Q15).

    Because of this you can't cast vector array data type with two double word registers to a quadword register... easily.

    Compiler may be smart enough to assist you, or you can just force conversion however I would check the resulting assembly for correctness as well as performance.

提交回复
热议问题