altivec

Clang equivalent of GCC's __builtin_darn()

こ雲淡風輕ζ 提交于 2019-12-24 11:46:58
问题 I'm trying to discover Clang's equivalent to GCC's __builtin_darn() on Power9. Grepping Clang 7.0 sources it looks like LLVM supports it: llvm_source$ cat llvm/test/MC/PowerPC/ppc64-encoding.s | grep darn -B 1 -A 1 # CHECK-BE: darn 2, 3 # encoding: [0x7c,0x43,0x05,0xe6] # CHECK-LE: darn 2, 3 # encoding: [0xe6,0x05,0x43,0x7c] darn 2, 3 However, I can't seen to find the builtin: llvm_source$ grep -IR darn | grep builtin llvm_source$ What is Clang equivalent of GCC's __builtin_darn() ? 回答1: You

Avoiding invalid memory load with SIMD instructions

ⅰ亾dé卋堺 提交于 2019-12-24 02:33:37
问题 I am loading elements from memory using SIMD load instructions, let say using Altivec, assuming aligned addresses: float X[SIZE]; vector float V0; unsigned FLOAT_VEC_SIZE = sizeof(vector float); for (int load_index =0; load_index < SIZE; load_index+=FLOAT_VEC_SIZE) { V0 = vec_ld(load_index, X); /* some computation involving V0*/ } Now if SIZE is not a multiple of FLOAT_VEC_SIZE, it is possible that V0 contains some invalid memory elements in the last loop iteration. One way to avoid that is

Is vec_sld endian sensitive?

為{幸葍}努か 提交于 2019-12-21 16:47:13
问题 I'm working on a PowerPC machine with in-core crypto. I'm having trouble porting AES key expansion from big endian to little endian using built-ins. Big endian works, but little endian does not. The algorithm below is the snippet presented in an IBM blog article. I think I have the issue isolated to line 2 below: typedef __vector unsigned char uint8x16_p8; uint8x64_p8 r0 = {0}; r3 = vec_perm(r1, r1, r5); /* line 1 */ r6 = vec_sld(r0, r1, 12); /* line 2 */ r3 = vcipherlast(r3, r4); /* line 3 *

What is the availability of 'vector long long'?

China☆狼群 提交于 2019-12-13 16:50:50
问题 I'm testing on an old PowerMac G5, which is a Power4 machine. The build is failing: $ make ... g++ -DNDEBUG -g2 -O3 -mcpu=power4 -maltivec -c ppc-simd.cpp ppc-crypto.h:36: error: use of 'long long' in AltiVec types is invalid make: *** [ppc-simd.o] Error 1 The failure is due to: typedef __vector unsigned long long uint64x2_p8; I'm having trouble determining when I should make the typedef available. With -mcpu=power4 -maltivec the machine reports 64-bit availability: $ gcc -mcpu=power4

Error: matching constraint not valid in output operand

我怕爱的太早我们不能终老 提交于 2019-12-13 08:07:45
问题 I'm having trouble getting GCC inline assembler to accept some inline assembly for Power9. The regular assembly I am trying to get GCC to accept is darn 3, 1 , where 3 is r3 and 1 is parameter called L in the docs. It disassembles to this on big-endian: 0: e6 05 61 7c darn r3,1 And on little-endian: 0: 7c 61 05 e6 darn r3,1 Due to various reasons and problems, including old compilers and compilers that pretend to be other compilers, I want to issue byte codes for the instruction. My test

Is it possible to rotate a 128-bit value in Altivec?

烈酒焚心 提交于 2019-12-11 12:45:00
问题 I'm trying to port some ARM NEON code to AltiVec. Our NEON code has two LOAD's, one ROT, one XOR and a STORE so it seems like a simple test case. According to IBM's vec_rl documentation: Each element of the result is obtained by rotating the corresponding element of a left by the number of bits specified by the corresponding element of b. The docs go on to say vector unsigned int is the largest data type unless -qarch=power8 , in which case vector unsigned long long applies. I'd like to

can't find materials about SSE2, Altivec, VMX on apple developer

白昼怎懂夜的黑 提交于 2019-12-11 10:02:48
问题 as Paul. R sugguested that there are plenty of resources about SSE2 , AVX on apple developer but I couldn't find it. Could anyone helps me ? BTW, I also looking for the archive of mail-list of altivec. thanks! Intel SSE and AVX Examples and Tutorials 来源: https://stackoverflow.com/questions/22978362/cant-find-materials-about-sse2-altivec-vmx-on-apple-developer

Porting MMX/SSE instructions to AltiVec

天涯浪子 提交于 2019-12-07 13:03:02
问题 Let me preface this with.. I have extremely limited experience with ASM, and even less with SIMD. But it happens that I have the following MMX/SSE optimised code, that I would like to port across to AltiVec instructions for use on PPC/Cell processors. This is probably a big ask.. Even though it's only a few lines of code, I've had no end of trouble trying to work out what's going on here. The original function: static inline int convolve(const short *a, const short *b, int n) { int out = 0;

Porting MMX/SSE instructions to AltiVec

不问归期 提交于 2019-12-05 18:36:08
Let me preface this with.. I have extremely limited experience with ASM, and even less with SIMD. But it happens that I have the following MMX/SSE optimised code, that I would like to port across to AltiVec instructions for use on PPC/Cell processors. This is probably a big ask.. Even though it's only a few lines of code, I've had no end of trouble trying to work out what's going on here. The original function: static inline int convolve(const short *a, const short *b, int n) { int out = 0; union { __m64 m64; int i32[2]; } tmp; tmp.i32[0] = 0; tmp.i32[1] = 0; while (n >= 4) { tmp.m64 = _mm_add

What makes Apple's PowerPC memcpy so fast?

空扰寡人 提交于 2019-12-04 16:05:32
问题 I've written several copy functions in search of a good memory strategy on PowerPC. Using the Altivec or fp registers with cache hints (dcb*) doubles the performance over a simple byte copy loop for large data. Initially pleased with that, I threw in a regular memcpy to see how it compared... 10x faster than my best! I have no intention of rewriting memcpy, but I do hope to learn from it and accelerate several simple image filters that spend most of their time moving pixels to and from memory