Translating SSE to Neon: How to pack and then extract 32bit result

前端 未结 2 671
北恋
北恋 2021-01-18 18:48

I have to translate the following instructions from SSE to Neon

 uint32_t a = _mm_cvtsi128_si32(_mm_shuffle_epi8(a,SHUFFLE_MASK) );

Where:<

2条回答
  •  鱼传尺愫
    2021-01-18 19:37

    I would write it as so:

    uint32_t extract (uint8x16_t x)
    {
      uint8x8x2_t a = vuzp_u8 (vget_low_u8 (x), vget_high_u8 (x));
      uint8x8x2_t b = vuzp_u8 (a.val[0], a.val[1]);
      return vget_lane_u32 (vreinterpret_u32_u8 (b.val[0]), 0);
    }
    

    Which on a recent GCC version compiles to:

    extract:
        vuzp.8  d0, d1
        vuzp.8  d0, d1
        vmov.32 r0, d0[0]
        bx  lr
    

提交回复
热议问题