mmx

Is there a way to subtract packed unsigned doublewords, saturated, on x86, using MMX/SSE?

99封情书 提交于 2019-12-10 14:48:22
问题 I've been looking at MMX/SSE and I am wondering. There are instructions for packed, saturated subtraction of unsigned bytes and words, but not doublewords. Is there a way of doing what I want, or if not, why is there none? 回答1: If you have SSE4.1 available, I don't think you can get better than using the pmaxud + psubd approach suggested by @harold. With AVX2, you can of course also use the corresponding 256bit variants. __m128i subs_epu32_sse4(__m128i a, __m128i b){ __m128i mx = _mm_max

Porting MMX/SSE instructions to AltiVec

天涯浪子 提交于 2019-12-07 13:03:02
问题 Let me preface this with.. I have extremely limited experience with ASM, and even less with SIMD. But it happens that I have the following MMX/SSE optimised code, that I would like to port across to AltiVec instructions for use on PPC/Cell processors. This is probably a big ask.. Even though it's only a few lines of code, I've had no end of trouble trying to work out what's going on here. The original function: static inline int convolve(const short *a, const short *b, int n) { int out = 0;

Porting MMX/SSE instructions to AltiVec

不问归期 提交于 2019-12-05 18:36:08
Let me preface this with.. I have extremely limited experience with ASM, and even less with SIMD. But it happens that I have the following MMX/SSE optimised code, that I would like to port across to AltiVec instructions for use on PPC/Cell processors. This is probably a big ask.. Even though it's only a few lines of code, I've had no end of trouble trying to work out what's going on here. The original function: static inline int convolve(const short *a, const short *b, int n) { int out = 0; union { __m64 m64; int i32[2]; } tmp; tmp.i32[0] = 0; tmp.i32[1] = 0; while (n >= 4) { tmp.m64 = _mm_add

How to convert 'long long' (or __int64) to __m64

你离开我真会死。 提交于 2019-12-04 11:52:10
问题 What is the proper way to convert an __int64 value to an __m64 value for use with SSE? 回答1: With gcc you can just use _mm_set_pi64x : #include <mmintrin.h> __int64 i = 0x123456LL; __m64 v = _mm_set_pi64x(i); Note that not all compilers have _mm_set_pi64x defined in mmintrin.h . For gcc it's defined like this: extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_set_pi64x (long long __i) { return (__m64) __i; } which suggests that you could probably just

difference between MMX and XMM register?

[亡魂溺海] 提交于 2019-12-03 16:51:52
问题 I'm currently learning assembly programming on Intel x86 processor. Could someone please explain to me, what is the difference between MMX and XMM register? I'm very confused in terms of what functions they serve and the difference and similarities between them? 回答1: MM registers are the registers used by the MMX instruction set, one of the first attempts to add (integer-only) SIMD to x86. They are 64 bit wide and they are actually aliases for the mantissa parts of the x87 registers (but they

How to convert 'long long' (or __int64) to __m64

爱⌒轻易说出口 提交于 2019-12-03 07:07:26
What is the proper way to convert an __int64 value to an __m64 value for use with SSE? With gcc you can just use _mm_set_pi64x : #include <mmintrin.h> __int64 i = 0x123456LL; __m64 v = _mm_set_pi64x(i); Note that not all compilers have _mm_set_pi64x defined in mmintrin.h . For gcc it's defined like this: extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_set_pi64x (long long __i) { return (__m64) __i; } which suggests that you could probably just use a cast if you prefer, e.g. __int64 i = 0x123456LL; __m64 v = (__m64)i; Failing that, if you're stuck

difference between MMX and XMM register?

大兔子大兔子 提交于 2019-12-03 06:50:38
I'm currently learning assembly programming on Intel x86 processor. Could someone please explain to me, what is the difference between MMX and XMM register? I'm very confused in terms of what functions they serve and the difference and similarities between them? MM registers are the registers used by the MMX instruction set, one of the first attempts to add (integer-only) SIMD to x86. They are 64 bit wide and they are actually aliases for the mantissa parts of the x87 registers (but they are not affected by the FPU top of the stack position); this was done to keep compatibility with existing

Common SIMD techniques

萝らか妹 提交于 2019-12-03 02:48:26
问题 Where can I find information about common SIMD tricks? I have an instruction set and know, how to write non-tricky SIMD code, but I know, SIMD now is much more powerful. It can hold complex conditional branchless code. For example ( ARMv6 ), the following sequence of instructions sets each byte of Rd equal to the unsigned minimum of the corresponding bytes of Ra and Rb: USUB8 Rd, Ra, Rb SEL Rd, Rb, Ra Links to tutorials / uncommon SIMD techniques are good too :) ARMv6 is the most interesting

Common SIMD techniques

好久不见. 提交于 2019-12-02 16:22:27
Where can I find information about common SIMD tricks? I have an instruction set and know, how to write non-tricky SIMD code, but I know, SIMD now is much more powerful. It can hold complex conditional branchless code. For example ( ARMv6 ), the following sequence of instructions sets each byte of Rd equal to the unsigned minimum of the corresponding bytes of Ra and Rb: USUB8 Rd, Ra, Rb SEL Rd, Rb, Ra Links to tutorials / uncommon SIMD techniques are good too :) ARMv6 is the most interesting for me, but x86 (SSE,...)/ Neon (in ARMv7)/others are good too. One of the best SIMD resources ever was

Are different mmx, sse and avx versions complementary or supersets of each other?

巧了我就是萌 提交于 2019-11-30 11:32:21
问题 I'm thinking I should familiarize myself with x86 SIMD extensions. But before I even began I ran into trouble. I can't find a good overview on which of them are still relevant. The x86 architecture has accumulated a lot of math/multimedia extensions over decades: MMX 3DNow! SSE SSE2 SSE3 SSSE3 SSE4 AVX AVX2 AVX512 Did I forget something? Are the newer ones supersets of the older ones and vice versa? Or are they complementary? Are some of them deprecated? Which of these are still relevant? I