Replacing memcpy with neon intrinsics
问题 I am trying to beat the "memcpy" function by writing the neon intrinsics for the same . Below is my logic : uint8_t* m_input; //Size as 400 x300 uint8_t* m_output; //Size as 400 x300 //not mentioning the complete code base for memory creat memcpy(m_output, m_input, sizeof(m_output[0]) * 300* 400); Neon: int32_t ht_index,wd_index; uint8x16_t vector8x16_image; for(int32_t htI =0;htI < m_roiHeight;htI++){ ht_index = htI * m_roiWidth ; for(int32_t wdI = 0;wdI < m_roiWidth;wdI+=16){ wd_index = ht