ieee-754

IEEE-754: cardinality of the set of rational numbers

柔情痞子 提交于 2019-12-02 04:13:43
What is the cardinality of the set of rational numbers, which have an exact representation in floating point format compatible with single-precision IEEE-754? There are 2139095039 finite positive floats. There are as many finite negative floats. Do you want to include +0.0 and -0.0 as two items or as one? Depending on the answer the total is 2 * 2139095039 + 2 or 2 * 2139095039 + 1, that is, respectively, 4278190080 or 4278190079. Source for the 2139095039 number: #include <float.h> #include <math.h> #include <stdlib.h> #include <stdio.h> #include <string.h> int main(void) { float f = FLT_MAX;

Why is Infinity × 0 = NaN?

本小妞迷上赌 提交于 2019-12-02 04:00:13
IEEE 754 specifies the result of 1 / 0 as ∞ (Infinity). However, IEEE 754 then specifies the result of 0 × ∞ as NaN. This feels counter-intuitive : Why is 0 × ∞ not 0? We can think of 1 / 0 = ∞ as the limit of 1 / z as z tends to zero We can think of 0 × ∞ = 0 as the limit of 0 × z as z tends to ∞. Why does the IEEE standard follow intuition 1. but not 2.? It is easier to understand the behavior of IEEE 754 floating point zeros and infinities if you do not think of them as being literally zero or infinite. The floating point zeros not only represent the real number zero. They also represent

Convert MBF Single and Double to IEEE

自作多情 提交于 2019-12-02 03:47:03
Follow-Up available: There's a follow-up with further details, see Convert MBF to IEEE . I've got some legacy data which is still in use, reading the binary files is not the problem, the number format is. All floating point numbers are saved in MBF format (Single and Double). I've found a topic about that on the MSDN boards but that one only deals with Single values. I'd also would like to stay away from API-Calls as far as I can. Does anyone have a solution for Doubles? Edit: Just in case somebody needs it, here is the VB.NET Code (it's Option Strict compliant) I ended up with (feel free to

Why does exponential notation with decimal values fail? [closed]

∥☆過路亽.° 提交于 2019-12-01 21:46:33
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . Conventionally 1e3 means 10**3 . >>> 1e3 1000.0 >>> 10**3 1000 Similar case is exp(3) compared to e**3 . >>> exp(3) 20.085536923187668

floating point operations in go

谁都会走 提交于 2019-12-01 21:40:53
问题 Here's the sample code in go: package main import "fmt" func mult32(a, b float32) float32 { return a*b } func mult64(a, b float64) float64 { return a*b } func main() { fmt.Println(3*4.3) // A1, 12.9 fmt.Println(mult32(3, 4.3)) // B1, 12.900001 fmt.Println(mult64(3, 4.3)) // C1, 12.899999999999999 fmt.Println(12.9 - 3*4.3) // A2, 1.8033161362862765e-130 fmt.Println(12.9 - mult32(3, 4.3)) // B2, -9.536743e-07 fmt.Println(12.9 - mult64(3, 4.3)) // C2, 1.7763568394002505e-15 fmt.Println(12.9 - 3

How to avoid less precise sum for numpy-arrays with multiple columns

ぐ巨炮叔叔 提交于 2019-12-01 21:28:52
I've always assumed, that numpy uses a kind of pairwise-summation , which ensures high precision also for float32 - operations: import numpy as np N=17*10**6 # float32-precision no longer enough to hold the whole sum print(np.ones((N,1),dtype=np.float32).sum(axis=0)) # [17000000.], kind of expected However, it looks as if a different algorithm is used if the matrix has more than one column: print(np.ones((N,2),dtype=np.float32).sum(axis=0)) # [16777216. 16777216.] the error is just to big print(np.ones((2*N,2),dtype=np.float32).sum(axis=0)) # [16777216. 16777216.] error is bigger Probably sum

How are floating point numbers stored inside the CPU?

纵然是瞬间 提交于 2019-12-01 21:28:38
问题 I am a Beginner and going through Assembly basics. Now while reading the matter, I came to this paragraph. It is explaining about how floating point numbers are stored inside memory. The exponent for a float is an 8 bit field. To allow large numbers or small numbers to be stored, the exponent is interpreted as positive or negative. The actual exponent is the value of the 8 bit field minus 127. 127 is the "exponent bias" for 32 bit floating point numbers. The fraction field of a float holds a

Why do we need IEEE 754 remainder?

非 Y 不嫁゛ 提交于 2019-12-01 18:05:00
I just read this topic (especially the last comments). Then I was wondering, why we actually need this was of giving the remainder. But it seems, that not many people "on google" were interested in that before... Simon Byrne If you're looking for reasons why you would want it, one is for what is known as "range reduction" Let's say you want sind function for computing the sine of an argument in degrees. A naive way to do this would be sind(x) = sin(x*pi/180) However pi here is not the true irrational number pi , but instead the floating point number closest to pi . This leads to things like

Can the floating-point status flag FE_UNDERFLOW set when the result is not sub-normal?

我只是一个虾纸丫 提交于 2019-12-01 18:00:56
While investigating floating-point exception status flags, I came across the curious case of a status flag FE_UNDERFLOW set when not expected. This is similar to When does underflow occur? yet goes into a corner case that may be a C specification issue or FP hardware defect. // pseudo code // s bias_expo implied "mantissa" w = smallest_normal; // 0 000...001 (1) 000...000 x = w * 2; // 0 000...010 (1) 000...000 y = next_smaller(x); // 0 000...001 (1) 111...111 round_mode(FE_TONEAREST); clear_status_flags(); z = y/2; // 0 000...001 (1) 000...000 FE_UNDERFLOW is set!? I did not expect FE

Floating Point Arithmetic error

人走茶凉 提交于 2019-12-01 17:49:16
I'm using the following function to approximate the derivative of a function at a point: def prime_x(f, x, h): if not f(x+h) == f(x) and not h == 0.0: return (f(x+h) - f(x)) / h else: raise PrecisionError As a test I'm passing f as fx and x as 3.0. Where fx is: def fx(x): import math return math.exp(x)*math.sin(x) Which has exp(x)*(sin(x)+cos(x)) as derivative. Now, according to Google and to my calculator exp(3)*(sin(3)+cos(3)) = -17.050059 . So far so good. But when I decided to test the function with small values for h I got the following: print prime_x(fx, 3.0, 10**-5) -17.0502585578 print