altivec

Is vec_sld endian sensitive?

北慕城南 提交于 2019-12-04 07:09:32
I'm working on a PowerPC machine with in-core crypto. I'm having trouble porting AES key expansion from big endian to little endian using built-ins. Big endian works, but little endian does not. The algorithm below is the snippet presented in an IBM blog article . I think I have the issue isolated to line 2 below: typedef __vector unsigned char uint8x16_p8; uint8x64_p8 r0 = {0}; r3 = vec_perm(r1, r1, r5); /* line 1 */ r6 = vec_sld(r0, r1, 12); /* line 2 */ r3 = vcipherlast(r3, r4); /* line 3 */ r1 = vec_xor(r1, r6); /* line 4 */ r6 = vec_sld(r0, r6, 12); /* line 5 */ r1 = vec_xor(r1, r6); /*

efficient way to convert scatter indices into gather indices?

非 Y 不嫁゛ 提交于 2019-11-28 12:13:04
I'm trying to write a stream compaction (take an array and get rid of empty elements) with SIMD intrinsics. Each iteration of the loop processes 8 elements at a time (SIMD width). With SSE intrinsics, I can do this fairly efficiently with _mm_shuffle_epi8(), which does a 16 entry table lookup (gather in parallel computing terminology). The shuffle indices are precomputed, and looked up with a bit mask. for (i = 0; i < n; i += 8) { v8n_Data = _mm_load_si128(&data[i]); mask = _mm_movemask_epi8(&is_valid[i]) & 0xff; // is_valid is byte array v8n_Compacted = _mm_shuffle_epi8(v16n_ShuffleIndices

efficient way to convert scatter indices into gather indices?

巧了我就是萌 提交于 2019-11-27 06:50:35
问题 I'm trying to write a stream compaction (take an array and get rid of empty elements) with SIMD intrinsics. Each iteration of the loop processes 8 elements at a time (SIMD width). With SSE intrinsics, I can do this fairly efficiently with _mm_shuffle_epi8(), which does a 16 entry table lookup (gather in parallel computing terminology). The shuffle indices are precomputed, and looked up with a bit mask. for (i = 0; i < n; i += 8) { v8n_Data = _mm_load_si128(&data[i]); mask = _mm_movemask_epi8(