ieee-754

Floating Point Arithmetic error

。_饼干妹妹 提交于 2020-01-11 08:46:30
问题 I'm using the following function to approximate the derivative of a function at a point: def prime_x(f, x, h): if not f(x+h) == f(x) and not h == 0.0: return (f(x+h) - f(x)) / h else: raise PrecisionError As a test I'm passing f as fx and x as 3.0. Where fx is: def fx(x): import math return math.exp(x)*math.sin(x) Which has exp(x)*(sin(x)+cos(x)) as derivative. Now, according to Google and to my calculator exp(3)*(sin(3)+cos(3)) = -17.050059 . So far so good. But when I decided to test the

Questions regarding operations on NaN

爷,独闯天下 提交于 2020-01-11 04:54:26
问题 My SSE-FPU generates the following NaNs: When I do a any basic dual operation like ADDSD, SUBSD, MULSD or DIVSD and one of both operands is a NaN, the result has the sign of the NaN-operand and the lower 51 bits of the mantissa of the result is loaded with the lower 51 bits of the mantissa of the NaN-operand. When both operations are NaN, the result is loaded with the sign of the destination-register and the lower 51 bits of the result-mantissa is loaded with the lower 51 bits of the

How to check if float can be exactly represented as an integer

若如初见. 提交于 2020-01-10 04:50:07
问题 I'm looking to for a reasonably efficient way of determining if a floating point value ( double ) can be exactly represented by an integer data type ( long , 64 bit). My initial thought was to check the exponent to see if it was 0 (or more precisely 127 ). But that won't work because 2.0 would be e=1 m=1... So basically, I am stuck. I have a feeling that I can do this with bit masks, but I'm just not getting my head around how to do that at this point. So how can I check to see if a double is

Arithmetic in ruby

自古美人都是妖i 提交于 2020-01-09 22:33:51
问题 Why this code 7.30 - 7.20 in ruby returns 0.0999999999999996 , not 0.10 ? But if i'll write 7.30 - 7.16 , for example, everything will be ok, i'll get 0.14 . What the problem, and how can i solve it? 回答1: What Every Computer Scientist Should Know About Floating-Point Arithmetic 回答2: The problem is that some numbers we can easily write in decimal don't have an exact representation in the particular floating point format implemented by current hardware. A casual way of stating this is that all

32 bit hex to 32 bit floating point (IEEE 754) conversion in matlab

你。 提交于 2020-01-09 10:33:55
问题 How can I change the 32 bit hex-value to a floating point value according to the IEEE 754? EDIT: ... data = fread(fid,1,'float32'); disp(data); ... I get this answer: 4.2950e+009 1.6274e+009 ... But how do I get 32 bit floating point (IEEE 754) numbers? 回答1: Based on one of your comments it appears that your hexadecimal values are stored as strings of characters in a file. You first want to read these characters from the file in groups of 8. Depending on the specific format of your file (e.g.

IEEE-754 floating-point precision: How much error is allowed?

*爱你&永不变心* 提交于 2020-01-09 08:01:30
问题 I'm working on porting the sqrt function (for 64-bit doubles) from fdlibm to a model-checker tool I'm using at the moment (cbmc). As part of my doings, I read a lot about the ieee-754 standard, but I think I didn't understand the guarantees of precision for the basic operations (incl. sqrt). Testing my port of fdlibm's sqrt, I got the following calculation with sqrt on a 64-bit double: sqrt

IEEE-754 floating-point precision: How much error is allowed?

与世无争的帅哥 提交于 2020-01-09 08:01:07
问题 I'm working on porting the sqrt function (for 64-bit doubles) from fdlibm to a model-checker tool I'm using at the moment (cbmc). As part of my doings, I read a lot about the ieee-754 standard, but I think I didn't understand the guarantees of precision for the basic operations (incl. sqrt). Testing my port of fdlibm's sqrt, I got the following calculation with sqrt on a 64-bit double: sqrt

Are the bit patterns of NaNs really hardware-dependent?

醉酒当歌 提交于 2020-01-09 03:20:11
问题 I was reading about floating-point NaN values in the Java Language Specification (I'm boring). A 32-bit float has this bit format: seee eeee emmm mmmm mmmm mmmm mmmm mmmm s is the sign bit, e are the exponent bits, and m are the mantissa bits. A NaN value is encoded as an exponent of all 1s, and the mantissa bits are not all 0 (which would be +/- infinity). This means that there are lots of different possible NaN values (having different s and m bit values). On this, JLS §4.2.3 says: IEEE 754

Why number are (not) representable in double precision IEEE754?

柔情痞子 提交于 2020-01-06 21:12:33
问题 I am confused on IEEE754 double precision, I consider two questions: 1. Why each number from interval -2 54 , -2 54 +2, -2 54 +4...2 54 is representable ? 2. Why 2 54 +2 is not representable ? Can you help me ? I understand way of working IEEE754 - however, I have a problem with seeing it. 回答1: There are 53 bits in the significand (or mantissa) of an IEEE 754 double. −2 54 can be exactly represented, as mantissa: 1.00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00 (bin) exponent:

Why number are (not) representable in double precision IEEE754?

邮差的信 提交于 2020-01-06 21:11:00
问题 I am confused on IEEE754 double precision, I consider two questions: 1. Why each number from interval -2 54 , -2 54 +2, -2 54 +4...2 54 is representable ? 2. Why 2 54 +2 is not representable ? Can you help me ? I understand way of working IEEE754 - however, I have a problem with seeing it. 回答1: There are 53 bits in the significand (or mantissa) of an IEEE 754 double. −2 54 can be exactly represented, as mantissa: 1.00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00 (bin) exponent: