问题
I'm trying to figure out how to generate a conditional Store in ARM neon. What I would like to do is the equivalent of this SSE instruction:
void _mm_maskmoveu_si128(__m128i d, __m128i n, char *p);
which Conditionally stores byte elements of d to address p.The high bit of each byte in the selector n determines whether the corresponding byte in d will be stored.
Any suggestion on how to do it with NEON intrinsics? Thank you
This is what I did:
int8x16_t store_mask = {0,0,0,0,0,0,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff};
int8x16_t tmp_dest = vld1q_u8((int8_t*)p_dest);
vbslq_u8(source,tmp_dest,store_mask);
vst1q_u8((int8_t*)p_dest,tmp_dest);
回答1:
Assuming vectors of 16 x 1 byte elements, you would set up a mask vector where each element is either all 0s (0x00
) or all 1s (0xff
) to determine whether the element should be stored on not. Then you need to do the following (pseudo code):
init mask vector = 0x00/0xff in each element
init source vector = data to be selectively stored
load dest vector from dest location
apply `vbslq_u8` (`vbit` instruction) with dest vector, source vector and mask vector
store dest vector back to dest location
来源:https://stackoverflow.com/questions/18312814/arm-neon-conditional-store-suggestion