ieee-754 | 易学教程

Portability of binary serialization of double/float type in C++

阅读更多关于 Portability of binary serialization of double/float type in C++

问题 The C++ standard does not discuss the underlying layout of float and double types, only the range of values they should represent. (This is also true for signed types, is it two\'s compliment or something else) My question is: What the are techniques used to serialize/deserialize POD types such as double and float in a portable manner? At the moment it seems the only way to do this is to have the value represented literally(as in \"123.456\"), The ieee754 layout for double is not standard on

Algorithm to convert an IEEE 754 double to a string?

阅读更多关于 Algorithm to convert an IEEE 754 double to a string?

问题 Many programming languages that use IEEE 754 doubles provide a library function to convert those doubles to strings. For example, C has sprintf , C++ has stringstream , Java has Double.toString , etc. Internally, how are these functions implemented? That is, what algorithm(s) are they using to convert the double into a string representation, given that they are often subject to programmer-chosen precision limitations? Thanks! 回答1: The code used by various software environments to convert

Why does division by zero in IEEE754 standard results in Infinite value?

阅读更多关于 Why does division by zero in IEEE754 standard results in Infinite value?

问题 I\'m just curious, why in IEEE-754 any non zero float number divided by zero results in infinite value? It\'s a nonsense from the mathematical perspective. So I think that correct result for this operation is NaN. Function f(x) = 1/x is not defined when x=0, if x is a real number. For example, function sqrt is not defined for any negative number and sqrt(-1.0f) if IEEE-754 produces a NaN value. But 1.0f/0 is Inf . But for some reason this is not the case in IEEE-754 . There must be a reason

What is difference between quiet NaN and signaling NaN?

阅读更多关于 What is difference between quiet NaN and signaling NaN?

问题 I have read about floating-point and I understand that NaN could results from operations. but I can\'t understand what are these concepts exactly. What is difference? Which one can be produced during C++ programming? As a programmer, could I write a program cause a sNaN? 回答1: When an operation results in a quiet NaN, there is no indication that anything is unusual until the program checks the result and sees a NaN. That is, computation continues without any signal from the floating point unit

What is a subnormal floating point number?

阅读更多关于 What is a subnormal floating point number?

问题 isnormal() reference page tells : Determines if the given floating point number arg is normal, i.e. is neither zero, subnormal, infinite, nor NaN. A number being zero, infinite or NaN is clear what it means. But it also says subnormal. When is a number subnormal? 回答1: In the IEEE754 standard, floating point numbers are represented as binary scientific notation, x = M × 2 e . Here M is the mantissa and e is the exponent . Mathematically, you can always choose the exponent so that 1 ≤ M < 2.*

32-bit to 16-bit Floating Point Conversion

阅读更多关于 32-bit to 16-bit Floating Point Conversion

问题 I need a cross-platform library/algorithm that will convert between 32-bit and 16-bit floating point numbers. I don\'t need to perform math with the 16-bit numbers; I just need to decrease the size of the 32-bit floats so they can be sent over the network. I am working in C++. I understand how much precision I would be losing, but that\'s OK for my application. The IEEE 16-bit format would be great. 回答1: std::frexp extracts the significand and exponent from normal floats or doubles -- then

How to check if C++ compiler uses IEEE 754 floating point standard

阅读更多关于 How to check if C++ compiler uses IEEE 754 floating point standard

问题 I would like to ask a question that follows this one which is pretty well answered by the define check if the compiler uses the standard. However this woks for C only. Is there a way to do the same in C++? I do not wish to covert floating point types to text or use some pretty complex conversion functions. I just need the compiler check. If you know a list of such compatible compilers please post the link. I could not find it. 回答1: Actually you have an easier way to achieve this in C++. From

Double precision - decimal places

阅读更多关于 Double precision - decimal places

问题 From what I have read, a value of data type double has an approximate precision of 15 decimal places. However, when I use a number whose decimal representation repeats, such as 1.0/7.0, I find that the variable holds the value of 0.14285714285714285 - which is 17 places (via the debugger). I would like to know why it is represented as 17 places internally, and why a precision of 15 is always written at ~15? 回答1: An IEEE double has 53 significant bits (that's the value of DBL_MANT_DIG in

What range of numbers can be represented in a 16-, 32- and 64-bit IEEE-754 systems?

阅读更多关于 What range of numbers can be represented in a 16-, 32- and 64-bit IEEE-754 systems?

问题 I know a little bit about how floating-point numbers are represented, but not enough, I\'m afraid. The general question is: For a given precision (for my purposes, the number of accurate decimal places in base 10), what range of numbers can be represented for 16-, 32- and 64-bit IEEE-754 systems? Specifically, I\'m only interested in the range of 16-bit and 32-bit numbers accurate to +/-0.5 (the ones place) or +/- 0.0005 (the thousandths place). 回答1: For a given IEEE-754 floating point number

Python float - str - float weirdness

阅读更多关于 Python float - str - float weirdness

问题 >>> float(str(0.65000000000000002)) 0.65000000000000002 >>> float(str(0.47000000000000003)) 0.46999999999999997 ??? What is going on here? How do I convert 0.47000000000000003 to string and the resultant value back to float? I am using Python 2.5.4 on Windows. 回答1: str(0.47000000000000003) give '0.47' and float('0.47') can be 0.46999999999999997 . This is due to the way floating point number are represented (see this wikipedia article) Note: float(repr(0.47000000000000003)) or eval(repr(0