floating-point

Returning From Catching A Floating Point Exception

不问归期 提交于 2021-02-07 14:36:13
问题 So, I am trying to return from a floating point exception, but my code keeps looping instead. I can actually exit the process, but what I want to do is return and redo the calculation that causes the floating point error. The reason the FPE occurs is because I have a random number generator that generates coefficients for a polynomial. Using some LAPACK functions, I solve for the roots and do some other things. Somewhere in this math intensive chain, a floating point exception occurs. When

Returning From Catching A Floating Point Exception

北慕城南 提交于 2021-02-07 14:35:28
问题 So, I am trying to return from a floating point exception, but my code keeps looping instead. I can actually exit the process, but what I want to do is return and redo the calculation that causes the floating point error. The reason the FPE occurs is because I have a random number generator that generates coefficients for a polynomial. Using some LAPACK functions, I solve for the roots and do some other things. Somewhere in this math intensive chain, a floating point exception occurs. When

How to print/convert decimal floating point values in GCC?

倖福魔咒の 提交于 2021-02-07 14:22:20
问题 The GCC docs describe limited decimal floating point support in recent GCCs. But how do I actually use it? For example on Fedora 18, GCC 4.7.2. A simple C program like int main() { _Decimal64 x = 0.10dd; return 0; } compiles (when using -std=gnu99) - but how do I actually do other useful stuff - like printing _Decimal64 values or converting strings to _Decimal64 values? The docs talk about 'a separate C library implementation' for (I assume) things like printf - which additional library do I

sum of series using float

戏子无情 提交于 2021-02-07 14:18:29
问题 I calculated the first 20 elements of the series - in 2 ways , 1st - forward , 2nd - backward. For this I did - #include <iostream> #include <math.h> using namespace std; float sumSeriesForward(int elementCount) { float sum = 0; for (int i = 0; i < elementCount; ++i) { sum += (float) 1 / (pow(3, i)); } return sum; } float sumSeriesBack(int elementCount) { float sum = 0; for (int i = (elementCount - 1); i >= 0; --i) { sum += (float) 1 / (pow(3, i)); } return sum; } int main() { cout.precision

Does calloc() of a double field always evaluate to 0.0?

寵の児 提交于 2021-02-07 14:12:39
问题 Does calloc() of a double field always evaluate to 0.0 ? Furthermore : Does calloc() of a float field always evaluate to 0.0f ? Does calloc() of an int or unsigned int field always evaluate to 0 ? That is , will the assert() below always succeed on all platforms? double* d = calloc(1, sizeof(double)); assert(*d == 0.0); free(d); 回答1: The calloc sets all bytes of the allocated memory to zero. As it happens, that's also the valid IEEE754 (which is the most common format for floating point

If two languages follow IEEE 754, will calculations in both languages result in the same answers?

心已入冬 提交于 2021-02-07 12:01:49
问题 I'm in the process of converting a program from Scilab code to C++. One loop in particular is producing a slightly different result than the original Scilab code (it's a long piece of code so I'm not going to include it in the question but I'll try my best to summarise the issue below). The problem is, each step of the loop uses calculations from the previous step. Additionally, the difference between calculations only becomes apparent around the 100,000th iteration (out of approximately 300

How to convert floating value to integer with exact precision like 123.3443 to 1233443?

╄→гoц情女王★ 提交于 2021-02-07 11:56:18
问题 Sample code: int main() { float f = 123.542; int i = (int)f; printf("%d\n",i); } 回答1: 123.3443 can't be exactly represented by a floating-point number -- in a 32-bit float, it's effectively represented as 16166984 / 131072 , which is actually 123.34429931640625, not 123.3443. (It's off by around 6.8 x 10^-7.) If this is really the result you want (which it's probably not), take a look at how IEEE-754 floats work, and pull out your favorite arbitrary-precision math suite. Once you understand

ValueError: could not convert string to float: '.'

a 夏天 提交于 2021-02-07 10:40:23
问题 I have a list of strings (CD_cent) like this: 2.374 2.559 1.204 and I want to multiply these numbers with a float number. For this I try to convert the list of strings to a list of floats for example with: CD_cent2=[float(x) for x in CD_cent] But I always get the error: ValueError: could not convert string to float: '.' . I guess this means, that it can't convert the dot to a float (?!) But how could I fix this? Why doesn't it recognize the dot? 回答1: You need to split each string as the

Why compile-time floating point calculations might not have the same results as run-time calculations?

安稳与你 提交于 2021-02-07 06:45:06
问题 In constexpr: Introduction, the speaker mentioned "Compile-time floating point calculations might not have the same results as runtime calculations": And the reason is related to "cross-compiling". Honestly, I can't get the idea clearly. IMHO, different platforms may also have different implementation of integers. Why does it only affect floating points? Or I miss something? 回答1: You're absolutely right that, at some level, the problem of calculating floating-point values at compile time is

What precision are floating-point arithmetic operations done in?

倾然丶 夕夏残阳落幕 提交于 2021-02-07 06:29:06
问题 Consider two very simple multiplications below: double result1; long double result2; float var1=3.1; float var2=6.789; double var3=87.45; double var4=234.987; result1=var1*var2; result2=var3*var4; Are multiplications by default done in a higher precision than the operands? I mean in case of first multiplication is it done in double precision and in case of second one in x86 architecture is it done in 80-bit extended-precision or we should cast operands in expressions to the higher precision