ieee-754 | 易学教程

Why do we bias the exponent of a floating-point number?

阅读更多关于 Why do we bias the exponent of a floating-point number?

I'm trying to wrap my head around this floating point representation of binary numbers, but I couldn't find, no matter where I looked, a good answer to the question. Why is the exponent biased? What's wrong with the good old reliable two's complement method? I tried to look at the Wikipedia's article regarding the topic, but all it says is: "the usual representation for signed values, would make comparison harder." The IEEE 754 encodings have a convenient property that an order comparison can be performed between two positive non-NaN numbers by simply comparing the corresponding bit strings

How to convert an IEEE 754 single-precision binary floating-point to decimal?

阅读更多关于 How to convert an IEEE 754 single-precision binary floating-point to decimal?

问题 I am working on a program that needs to convert a 32-bit number into a decimal number. The number that I get from input is a 32 bit number represented as floating point. The first bit is the sign, the next 8 bits are the exponent, and the other 23 bits are mantissa. I am working the program in C. In input, I get that number as a char[] array, and after that I am making a new int[] array where I store the sign , the exponent and the mantissa. But, I have problem with the mantissa when I am

Number of consecutive zeros in the decimal representation of a double

阅读更多关于 Number of consecutive zeros in the decimal representation of a double

问题 What is the maximum number of consecutive non-leading non-trailing zeros (resp. nines) in the exact decimal representation of an IEEE 754 double-precision number? Context Consider the problem of converting a double to decimal, rounding up (resp. down), when the only primitive you are able to use is an existing function that converts to the nearest (correctly rounded to any desired number of digits). You could get a few additional digits and remove them yourself. For instance, to round 1.875

Why does casting Double.NaN to int not throw an exception in Java?

阅读更多关于 Why does casting Double.NaN to int not throw an exception in Java?

问题 So I know the IEEE 754 specifies some special floating point values for values that are not real numbers. In Java, casting those values to a primitive int does not throw an exception like I would have expected. Instead we have the following: int n; n = (int)Double.NaN; // n == 0 n = (int)Double.POSITIVE_INFINITY; // n == Integer.MAX_VALUE n = (int)Double.NEGATIVE_INFINITY; // n == Integer.MIN_VALUE What is the rationale for not throwing exceptions in these cases? Is this an IEEE standard, or

How can I convert 4 bytes storing an IEEE 754 floating point number to a float value in C?

阅读更多关于 How can I convert 4 bytes storing an IEEE 754 floating point number to a float value in C?

问题 My program reads into 4 bytes an IEEE 754 floating point number from a file. I need to portable convert those bytes to my C compilers float type. In other words I need a function with the prototype float IEEE_754_to_float(uint8_t raw_value[4]) for my C program. 回答1: If your implementation can guarantee correct endianness: float raw2ieee(uint8_t *raw) { // either union { uint8_t bytes[4]; float fp; } un; memcpy(un.bytes, raw, 4); return un.fp; // or, as seen in the fast inverse square root:

CLR JIT optimizations violates causality?

阅读更多关于 CLR JIT optimizations violates causality?

I was writing an instructive example for a colleague to show him why testing floats for equality is often a bad idea. The example I went with was adding .1 ten times, and comparing against 1.0 (the one I was shown in my introductory numerical class). I was surprised to find that the two results were equal ( code + output ). float @float = 0.0f; for(int @int = 0; @int < 10; @int += 1) { @float += 0.1f; } Console.WriteLine(@float == 1.0f); Some investigation showed that this result could not be relied upon (much like float equality). The one I found most surprising was that adding code after the

Get next smallest Double number

阅读更多关于 Get next smallest Double number

As part of a unit test, I need to test some boundary conditions. One method accepts a System.Double argument. Is there a way to get the next-smallest double value? (i.e. decrement the mantissa by 1 unit-value)? I considered using Double.Epsilon but this is unreliable as it's only the smallest delta from zero, and so doesn't work for larger values (i.e. 9999999999 - Double.Epsilon == 9999999999 ). So what is the algorithm or code needed such that: NextSmallest(Double d) < d ...is always true. If your numbers are finite, you can use a couple of convenient methods in the BitConverter class: long

IEEE 754 floating point arithmetic rounding error in c# and javascript

阅读更多关于 IEEE 754 floating point arithmetic rounding error in c# and javascript

I just read a book about javascript. The author mentioned a floating point arithmetic rounding error in the IEEE 754 standard. For example adding 0.1 and 0.2 yields 0.30000000000000004 instead of 0.3. so (0.1 + 0.2) == 0.3 returns false. I also reproduced this error in c#. So these are my question is: How often this error occurs? What is the best practice workaround in c# and javascript? Which other languages have the same error? It's not an error in the language. It's not an error in IEEE 754. It's an error in the expectation and usage of binary floating point numbers. Once you understand

How is fma() implemented

阅读更多关于 How is fma() implemented

问题 According to the documentation, there is a fma() function in math.h . That is very nice, and I know how FMA works and what to use it for. However, I am not so certain how this is implemented in practice? I'm mostly interested in the x86 and x86_64 architectures. Is there a floating-point (non-vector) instruction for FMA, perhaps as defined by IEEE-754 2008? Is FMA3 or FMA4 instruction used? Is there an intrinsic to make sure that a real FMA is used, when the precision is relied upon? 回答1: The

Why does Excel not round according to 8-byte IEEE 754

阅读更多关于 Why does Excel not round according to 8-byte IEEE 754

The following expression evaluates to false in C#: (1 + 1 + 0.85) / 3 <= 0.95 And I suppose it does so in most other programming languages which implement IEEE 754, since (1 + 1 + 0.85) / 3 evaluates to 0.95000000000000007 , which is greater than 0.95 . However, even though Excel should implement most of IEEE 754 too , the following evaluates to TRUE in Excel 2013: = ((1 + 1 + 0.85) / 3 <= 0.95) Is there any specific reason for that? The article linked above does not mention any custom implementations of Excel that can lead to this behavior. Can you tell Excel to strictly round according to