ieee-754 | 易学教程

Representing integers in doubles

阅读更多关于 Representing integers in doubles

Can a double (of a given number of bytes, with a reasonable mantissa/exponent balance) always fully precisely hold the range of an unsigned integer of half that number of bytes? E.g. can an eight byte double fully precisely hold the range of numbers of a four byte unsigned int? What this will boil down to is if a two byte float can hold the range of a one byte unsigned int. A one byte unsigned int will of course be 0 -> 255. An IEEE754 64-bit double can represent any 32-bit integer, simply because it has 53-odd (a) bits available for precision and the 32-bit integer only needs, well, 32 :-) It

Why do we bias the exponent of a floating-point number?

阅读更多关于 Why do we bias the exponent of a floating-point number?

问题 I'm trying to wrap my head around this floating point representation of binary numbers, but I couldn't find, no matter where I looked, a good answer to the question. Why is the exponent biased? What's wrong with the good old reliable two's complement method? I tried to look at the Wikipedia's article regarding the topic, but all it says is: "the usual representation for signed values, would make comparison harder." 回答1: The IEEE 754 encodings have a convenient property that an order

Portability of binary serialization of double/float type in C++

阅读更多关于 Portability of binary serialization of double/float type in C++

The C++ standard does not discuss the underlying layout of float and double types, only the range of values they should represent. (This is also true for signed types, is it two's compliment or something else) My question is: What the are techniques used to serialize/deserialize POD types such as double and float in a portable manner? At the moment it seems the only way to do this is to have the value represented literally(as in "123.456"), The ieee754 layout for double is not standard on all architectures. Brian "Beej Jorgensen" Hall gives in his Guide to Network Programming some code to pack

Why does Excel not round according to 8-byte IEEE 754

阅读更多关于 Why does Excel not round according to 8-byte IEEE 754

问题 The following expression evaluates to false in C#: (1 + 1 + 0.85) / 3 <= 0.95 And I suppose it does so in most other programming languages which implement IEEE 754, since (1 + 1 + 0.85) / 3 evaluates to 0.95000000000000007 , which is greater than 0.95 . However, even though Excel should implement most of IEEE 754 too, the following evaluates to TRUE in Excel 2013: = ((1 + 1 + 0.85) / 3 <= 0.95) Is there any specific reason for that? The article linked above does not mention any custom

Type-juggling and (strict) greater/lesser-than comparisons in PHP

阅读更多关于 Type-juggling and (strict) greater/lesser-than comparisons in PHP

PHP is famous for its type-juggling. I must admit it puzzles me, and I'm having a hard time to find out basic logical/fundamental things in comparisons. For example: If $a > $b is true and $b > $c is true, must it mean that $a > $c is always true too? Following basic logic, I would say yes however I'm that puzzled I do not really trust PHP in this. Maybe someone can provide an example where this is not the case? Also I'm wondering with the strict lesser-than and strict greater-than operators (as their meaning is described as strictly which I only knew in the past from the equality comparisons)

Difference between Java's `Double.MIN_NORMAL` and `Double.MIN_VALUE`?

阅读更多关于 Difference between Java's `Double.MIN_NORMAL` and `Double.MIN_VALUE`?

问题 What's the difference between Double.MIN_NORMAL (introduced in Java 1.6) and Double.MIN_VALUE? 回答1: The answer can be found in the IEEE specification of floating point representation: For the single format, the difference between a normal number and a subnormal number is that the leading bit of the significand (the bit to left of the binary point) of a normal number is 1, whereas the leading bit of the significand of a subnormal number is 0. Single-format subnormal numbers were called single

Ensuring C++ doubles are 64 bits

阅读更多关于 Ensuring C++ doubles are 64 bits

问题 In my C++ program, I need to pull a 64 bit float from an external byte sequence. Is there some way to ensure, at compile-time, that doubles are 64 bits? Is there some other type I should use to store the data instead? Edit: If you're reading this and actually looking for a way to ensure storage in the IEEE 754 format, have a look at Adam Rosenfield's answer below. 回答1: An improvement on the other answers (which assume a char is 8-bits, the standard does not guarantee this..). Would be like

CLR JIT optimizations violates causality?

阅读更多关于 CLR JIT optimizations violates causality?

问题 I was writing an instructive example for a colleague to show him why testing floats for equality is often a bad idea. The example I went with was adding .1 ten times, and comparing against 1.0 (the one I was shown in my introductory numerical class). I was surprised to find that the two results were equal (code + output). float @float = 0.0f; for(int @int = 0; @int < 10; @int += 1) { @float += 0.1f; } Console.WriteLine(@float == 1.0f); Some investigation showed that this result could not be

Get next smallest Double number

阅读更多关于 Get next smallest Double number

问题 As part of a unit test, I need to test some boundary conditions. One method accepts a System.Double argument. Is there a way to get the next-smallest double value? (i.e. decrement the mantissa by 1 unit-value)? I considered using Double.Epsilon but this is unreliable as it's only the smallest delta from zero, and so doesn't work for larger values (i.e. 9999999999 - Double.Epsilon == 9999999999 ). So what is the algorithm or code needed such that: NextSmallest(Double d) < d ...is always true.

Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

阅读更多关于 Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

问题 I would like to know if any code in C or C++ using floating point arithmetic would produce bit exact results in any x86 based architecture, regardless of the complexity of the code. To my knowledge, any x86 architecture since the Intel 8087 uses a FPU unit prepared to handle IEEE-754 floating point numbers, and I cannot see any reason why the result would be different in different architectures. However, if they were different (namely due to different compiler or different optimization level)