ieee-754 | 易学教程

IEEE-754: cardinality of the set of rational numbers

阅读更多关于 IEEE-754: cardinality of the set of rational numbers

What is the cardinality of the set of rational numbers, which have an exact representation in floating point format compatible with single-precision IEEE-754? There are 2139095039 finite positive floats. There are as many finite negative floats. Do you want to include +0.0 and -0.0 as two items or as one? Depending on the answer the total is 2 * 2139095039 + 2 or 2 * 2139095039 + 1, that is, respectively, 4278190080 or 4278190079. Source for the 2139095039 number: #include <float.h> #include <math.h> #include <stdlib.h> #include <stdio.h> #include <string.h> int main(void) { float f = FLT_MAX;

Why is Infinity × 0 = NaN?

阅读更多关于 Why is Infinity × 0 = NaN?

IEEE 754 specifies the result of 1 / 0 as ∞ (Infinity). However, IEEE 754 then specifies the result of 0 × ∞ as NaN. This feels counter-intuitive : Why is 0 × ∞ not 0? We can think of 1 / 0 = ∞ as the limit of 1 / z as z tends to zero We can think of 0 × ∞ = 0 as the limit of 0 × z as z tends to ∞. Why does the IEEE standard follow intuition 1. but not 2.? It is easier to understand the behavior of IEEE 754 floating point zeros and infinities if you do not think of them as being literally zero or infinite. The floating point zeros not only represent the real number zero. They also represent

Convert MBF Single and Double to IEEE

阅读更多关于 Convert MBF Single and Double to IEEE

Follow-Up available: There's a follow-up with further details, see Convert MBF to IEEE . I've got some legacy data which is still in use, reading the binary files is not the problem, the number format is. All floating point numbers are saved in MBF format (Single and Double). I've found a topic about that on the MSDN boards but that one only deals with Single values. I'd also would like to stay away from API-Calls as far as I can. Does anyone have a solution for Doubles? Edit: Just in case somebody needs it, here is the VB.NET Code (it's Option Strict compliant) I ended up with (feel free to

Why does exponential notation with decimal values fail? [closed]

阅读更多关于 Why does exponential notation with decimal values fail? [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . Conventionally 1e3 means 10**3 . >>> 1e3 1000.0 >>> 10**3 1000 Similar case is exp(3) compared to e**3 . >>> exp(3) 20.085536923187668

floating point operations in go

阅读更多关于 floating point operations in go

问题 Here's the sample code in go: package main import "fmt" func mult32(a, b float32) float32 { return a*b } func mult64(a, b float64) float64 { return a*b } func main() { fmt.Println(3*4.3) // A1, 12.9 fmt.Println(mult32(3, 4.3)) // B1, 12.900001 fmt.Println(mult64(3, 4.3)) // C1, 12.899999999999999 fmt.Println(12.9 - 3*4.3) // A2, 1.8033161362862765e-130 fmt.Println(12.9 - mult32(3, 4.3)) // B2, -9.536743e-07 fmt.Println(12.9 - mult64(3, 4.3)) // C2, 1.7763568394002505e-15 fmt.Println(12.9 - 3

How to avoid less precise sum for numpy-arrays with multiple columns

阅读更多关于 How to avoid less precise sum for numpy-arrays with multiple columns

I've always assumed, that numpy uses a kind of pairwise-summation , which ensures high precision also for float32 - operations: import numpy as np N=17*10**6 # float32-precision no longer enough to hold the whole sum print(np.ones((N,1),dtype=np.float32).sum(axis=0)) # [17000000.], kind of expected However, it looks as if a different algorithm is used if the matrix has more than one column: print(np.ones((N,2),dtype=np.float32).sum(axis=0)) # [16777216. 16777216.] the error is just to big print(np.ones((2*N,2),dtype=np.float32).sum(axis=0)) # [16777216. 16777216.] error is bigger Probably sum

How are floating point numbers stored inside the CPU?

阅读更多关于 How are floating point numbers stored inside the CPU?

问题 I am a Beginner and going through Assembly basics. Now while reading the matter, I came to this paragraph. It is explaining about how floating point numbers are stored inside memory. The exponent for a float is an 8 bit field. To allow large numbers or small numbers to be stored, the exponent is interpreted as positive or negative. The actual exponent is the value of the 8 bit field minus 127. 127 is the "exponent bias" for 32 bit floating point numbers. The fraction field of a float holds a

Why do we need IEEE 754 remainder?

阅读更多关于 Why do we need IEEE 754 remainder?

I just read this topic (especially the last comments). Then I was wondering, why we actually need this was of giving the remainder. But it seems, that not many people "on google" were interested in that before... Simon Byrne If you're looking for reasons why you would want it, one is for what is known as "range reduction" Let's say you want sind function for computing the sine of an argument in degrees. A naive way to do this would be sind(x) = sin(x*pi/180) However pi here is not the true irrational number pi , but instead the floating point number closest to pi . This leads to things like

Can the floating-point status flag FE_UNDERFLOW set when the result is not sub-normal?

阅读更多关于 Can the floating-point status flag FE_UNDERFLOW set when the result is not sub-normal?

While investigating floating-point exception status flags, I came across the curious case of a status flag FE_UNDERFLOW set when not expected. This is similar to When does underflow occur? yet goes into a corner case that may be a C specification issue or FP hardware defect. // pseudo code // s bias_expo implied "mantissa" w = smallest_normal; // 0 000...001 (1) 000...000 x = w * 2; // 0 000...010 (1) 000...000 y = next_smaller(x); // 0 000...001 (1) 111...111 round_mode(FE_TONEAREST); clear_status_flags(); z = y/2; // 0 000...001 (1) 000...000 FE_UNDERFLOW is set!? I did not expect FE

Floating Point Arithmetic error

阅读更多关于 Floating Point Arithmetic error

I'm using the following function to approximate the derivative of a function at a point: def prime_x(f, x, h): if not f(x+h) == f(x) and not h == 0.0: return (f(x+h) - f(x)) / h else: raise PrecisionError As a test I'm passing f as fx and x as 3.0. Where fx is: def fx(x): import math return math.exp(x)*math.sin(x) Which has exp(x)*(sin(x)+cos(x)) as derivative. Now, according to Google and to my calculator exp(3)*(sin(3)+cos(3)) = -17.050059 . So far so good. But when I decided to test the function with small values for h I got the following: print prime_x(fx, 3.0, 10**-5) -17.0502585578 print