I\'m trying to pack 16 bits data to 8 bits by using _mm256_shuffle_epi8 but the result i have is not what i\'m expecting.
auto srcData = _mm256_setr_epi8(1, 2,
Yeah, to be expected. Look at the docs for _mm_shuffle_epi8. The 256bit avx version simply duplicates the behaviour of that 128bit instruction for the two 16byte values in the YMM register.
So you can shuffle the first 16 values, or the last 16 values; however you cannot shuffle values across the 16byte boundary. (You'll notice that all numbers over 16, are the same numbers minus 16. e.g. 19->3, 31->15, etc).
you'll need to do this with an additional step.
__m256i vperm = _mm256_setr_epi8( 0, 2, 4, 6, 8, 10, 12, 14,
-1, -1, -1, -1, -1, -1, -1, -1,
0, 2, 4, 6, 8, 10, 12, 14,
-1, -1, -1, -1, -1, -1, -1, -1);
and then use _mm256_permute2f128_si256 to pull the 0th and 2nd byte into the first 128bits.