x87

Add a constant value to a xmm register in x86

谁说我不能喝 提交于 2019-12-06 01:24:00
How would I add 1 or 2 to the register xmm0 (double)? I can do it like this, but sure there must be an easier way: movsd xmm0, [ecx] xor eax, eax inc eax cvtsi2sd xmm1, eax addsd xmm0, xmm1 movsd [ecx], xmm0 Also would it be possible to do this with the floating point x87 instructions? This doesn't work for me: fld dword ptr [ecx] fld1 faddp fstp dword ptr [ecx] You can keep a constant in memory or in another register: _1 dq 1.0 and addsd xmm1,[_1] or movsd xmm0,[_1] addsd xmm1,xmm0 If you are on x64, you can do this: mov rax,1.0 movq xmm0,rax addsd xmm1,xmm0 or use the stack if the type

Using FPU with C inline assembly

a 夏天 提交于 2019-12-05 23:11:18
I wrote a vector structure like this: struct vector { float x1, x2, x3, x4; }; Then I created a function which does some operations with inline assembly using the vector: struct vector *adding(const struct vector v1[], const struct vector v2[], int size) { struct vector vec[size]; int i; for(i = 0; i < size; i++) { asm( "FLDL %4 \n" //v1.x1 "FADDL %8 \n" //v2.x1 "FSTL %0 \n" "FLDL %5 \n" //v1.x2 "FADDL %9 \n" //v2.x2 "FSTL %1 \n" "FLDL %6 \n" //v1.x3 "FADDL %10 \n" //v2.x3 "FSTL %2 \n" "FLDL %7 \n" //v1.x4 "FADDL %11 \n" //v2.x4 "FSTL %3 \n" :"=m"(vec[i].x1), "=m"(vec[i].x2), "=m"(vec[i].x3),

Adding floating point/double numbers in assembly

佐手、 提交于 2019-12-05 07:18:05
I am trying to experiment with inline assembly, and I am trying to add decimal numbers (no, NOT integers) in inline assembly. Issue is, when I call the following function: inline double ADD(double num1, double num2) { double res; _asm{ push eax; push the former state of eax onto stack mov eax, num1; add eax, num2; mov res, eax; pop eax; restore the former state of eax now that we are done } return res;} The compiler complains of improper operand size at the inline assembly (ALL lines of assembly excluding the push and pop instruction lines). So I have to change to an integer type, such as

How to specify clobbered bottom of the x87 FPU stack with extended gcc assembly?

送分小仙女□ 提交于 2019-12-04 03:31:27
问题 In a codebase of ours I found this snippet for fast, towards-negative-infinity 1 rounding on x87: inline int my_int(double x) { int r; #ifdef _GCC_ asm ("fldl %1\n" "fistpl %0\n" :"=m"(r) :"m"(x)); #else // ... #endif return r; } I'm not extremely familiar with GCC extended assembly syntax, but from what I gather from the documentation: r must be a memory location, where I'm writing back stuff; x must be a memory location too, whence the data comes from. there's no clobber specification, so

Calling fsincos instruction in LLVM slower than calling libc sin/cos functions?

大憨熊 提交于 2019-12-03 17:15:25
问题 I am working on a language that is compiled with LLVM. Just for fun, I wanted to do some microbenchmarks. In one, I run some million sin / cos computations in a loop. In pseudocode, it looks like this: var x: Double = 0.0 for (i <- 0 to 100 000 000) x = sin(x)^2 + cos(x)^2 return x.toInteger If I'm computing sin/cos using LLVM IR inline assembly in the form: %sc = call { double, double } asm "fsincos", "={st(1)},={st},1,~{dirflag},~{fpsr},~{flags}" (double %"res") nounwind this is faster than

Calling fsincos instruction in LLVM slower than calling libc sin/cos functions?

╄→尐↘猪︶ㄣ 提交于 2019-12-03 07:08:28
I am working on a language that is compiled with LLVM. Just for fun, I wanted to do some microbenchmarks. In one, I run some million sin / cos computations in a loop. In pseudocode, it looks like this: var x: Double = 0.0 for (i <- 0 to 100 000 000) x = sin(x)^2 + cos(x)^2 return x.toInteger If I'm computing sin/cos using LLVM IR inline assembly in the form: %sc = call { double, double } asm "fsincos", "={st(1)},={st},1,~{dirflag},~{fpsr},~{flags}" (double %"res") nounwind this is faster than using fsin and fcos separately instead of fsincos. However, it is slower than if I calling the llvm

Benefits of x87 over SSE

你。 提交于 2019-12-03 01:19:11
I know that x87 has higher internal precision, which is probably the biggest difference that people see between it and SSE operations. But I have to wonder, is there any other benefit to using x87? I have a habit of typing -mfpmath=sse automatically in any project, and I wonder if I'm missing anything else that the x87 FPU offers. Nils Pipenbrinck For hand-written asm, x87 has some instructions that don't exist in the SSE instruction set. Off the top of my head, it's all trigonometric stuff like fsin, fcos, fatan, fatan2 and some exponential/logarithm stuff. With gcc -O3 -ffast-math -mfpmath

FCOM floating point comparison fails

你。 提交于 2019-12-02 12:07:53
问题 I am just getting started with 32-bit assembly and I'm quite confused. I have the following code: .586 .MODEL FLAT .STACK 4096 .DATA .CODE main PROC finit fldpi fld1 fcom fstsw ax sahf JL jumper nop jumper: nop nop main ENDP END Now from what I understand, I am pushing pi onto the stack then pushing 1 onto the stack, it should compare pi and 1 and see that 1 is lesser and execute a jump. However the comparison doesn't appear to work. Can someone help? 回答1: Change JL to JB , since you can only

Using FPU return values in c++ code

夙愿已清 提交于 2019-12-02 11:27:51
I have an x86 NASM program which seems to work perfectly. I have problems using the values returned from it. This is 32-Bit Windows using MSVC++. I expect the return value in ST0 . A minimal example demonstrating the problem with the returned values can be seen in this C++ and NASM assembly code: #include <iostream> extern "C" float arsinh(float); int main() { float test = arsinh(5.0); printf("%f\n", test); printf("%f\n", arsinh(5.0)); std::cout << test << std::endl; std::cout << arsinh(5.0) << std::endl; } Assembly code: section .data value: dq 1.0 section .text global _arsinh _arsinh: fld

Which bits in the x87 tag word does FFREE ST(i) modify?

点点圈 提交于 2019-12-02 05:18:25
问题 This example was written in NASM: section .bss var28: resb 28 section .text _main: ; Initialize finit fldpi ; Read Tag Word fstenv [var28] mov ax, [var28 + 8] ; move the Tag Word to ax At this point ax = 0011 1111 1111 1111, which means ST7 = 00 (valid), and the rest is 11 (empty). The rest of the code: ; FFREE ST(i) ffree ST7 ; Sets tag for ST(i) to empty. ; Read Tag Word fstenv [var28] mov ax, [var28 + 8] ; move the Tag Word to ax At this point ax = 0011 1111 1111 1111 too. My question is,