ieee-754 | 易学教程

How to test if numeric conversion will change value?

阅读更多关于 How to test if numeric conversion will change value?

I'm performing some data type conversions where I need to represent uint , long , ulong and decimal as IEEE 754 double floating point values. I want to be able to detect if the IEEE 754 data type cannot contain the value before I perform the conversion. A brute force solution would be to wrap a try-catch around a cast to double looking for OverflowException . Reading through certain of the CLR documentation implies that some conversions just silently change the value without any exceptions. Is there any fool proof way to do this check? I'm looking for completeness over ease of implementation.

IEEE 754: How exactly does it work?

阅读更多关于 IEEE 754: How exactly does it work?

Why does the following code behave as it does in C? float x = 2147483647; //2^31 printf("%f\n", x); //Outputs 2147483648 Here is my thought process: 2147483647 = 0 1001 1101 1111 1111 1111 1111 1111 111 (0.11111111111111111111111)base2 = (1-(0.5)^23)base10 => (1.11111111111111111111111)base2 = (1 + 1-(0.5)^23)base10 = (1.99999988)base10 Therefore, to convert the IEEE 754 notation back to decimal: 1.99999988 * 2^30 = 2147483520 So technically, the C program must have printed out 2147483520, right? The value to be represented would be 2147483647. the next two values which can be represented this

The Double Byte Size in 32 bit and 64 bit OS

阅读更多关于 The Double Byte Size in 32 bit and 64 bit OS

问题 Is there a difference in double size when I run my app on 32 and 64 bit environment? If I am not mistaken the double in 32 bit environment will take up 16 digits after 0, whereas the double in 64 bit will take up 32 bit, am I right? 回答1: No, an IEEE 754 double-precision floating point number is always 64 bits. Similarly, a single-precision float is always 32 bits. If your question is about C# and/or .NET specifically (as your tag would indicate), all of the data type sizes are fixed,

Is it possible to get 0 by subtracting two unequal floating point numbers?

阅读更多关于 Is it possible to get 0 by subtracting two unequal floating point numbers?

问题 Is it possible to get division by 0 (or infinity) in the following example? public double calculation(double a, double b) { if (a == b) { return 0; } else { return 2 / (a - b); } } In normal cases it will not, of course. But what if a and b are very close, can (a-b) result in being 0 due to precision of the calculation? Note that this question is for Java, but I think it will apply to most programming languages. 回答1: In Java, a - b is never equal to 0 if a != b . This is because Java mandates

how to convert floating-point number to IEEE 754 using assembly

阅读更多关于 how to convert floating-point number to IEEE 754 using assembly

问题 can you please help me to convert floating-point number to IEEE 754 using assembly i have this number -1.75 and i know it equla to -1.11000000000000000000000 E+0 on IEEE754 but i dont know how to do the convert in assembly 回答1: Did you mean something like this: ; Conversion of an ASCII-coded decimal rational number (DecStr) ; to an ASCII-coded decimal binary number (BinStr) as 32-bit single (IEEE 754) include \masm32\include\masm32rt.inc ; MASM32 headers, mainly for printf .data DecStr db "-1

Get raw bytes of a float in Swift

阅读更多关于 Get raw bytes of a float in Swift

How can I read the raw bytes of a Float or Double in Swift? Example: let x = Float(1.5) let bytes1: UInt32 = getRawBytes(x) let bytes2: UInt32 = 0b00111111110000000000000000000000 I want bytes1 and bytes2 to contain the same value, since this binary number is the Float representation of 1.5 . I need it to do bit-wise operations like & and >> (these are not defined on a float). Update for Swift 3: As of Swift 3, all floating point types have bitPattern property which returns an unsigned integer with the same memory representation, and a corresponding init(bitPattern:) constructor for the

Sorting floating-point values using their byte-representation

阅读更多关于 Sorting floating-point values using their byte-representation

If have an 8-byte section of data and write a double-precision floating-point value to it, under what conditions will comparison by numerical comparison and lexicographic sorting of the bytes agree? Current theory: positive, big-endian I believe that if the number is positive, and the representation is big-endian, then numerical ordering of the floating-point values will match the lexicographic ordering of the bytes. The idea is that it would first sort on the exponent, then on the mantissa. Even the "denormalized" IEEE representation shouldn't cause any problems. Is this true? (I'm using Node

How to alter double by its smallest increment

阅读更多关于 How to alter double by its smallest increment

Is something broken or I fail to understand what is happening? static String getRealBinary(double val) { long tmp = Double.doubleToLongBits(val); StringBuilder sb = new StringBuilder(); for (long n = 64; --n > 0; tmp >>= 1) if ((tmp & 1) == 0) sb.insert(0, ('0')); else sb.insert(0, ('1')); sb.insert(0, '[').insert(2, "] [").insert(16, "] [").append(']'); return sb.toString(); } public static void main(String[] argv) { for (int j = 3; --j >= 0;) { double d = j; for (int i = 3; --i >= 0;) { d += Double.MIN_VALUE; System.out.println(d +getRealBinary(d)); } } } With output: 2.0[1] [00000000000]

IEE 754 total order in standard C++11

阅读更多关于 IEE 754 total order in standard C++11

According to the IEEE floating point wikipage (on IEEE 754), there is a total order on double-precision floating points (i.e. on C++11 implementations having IEEE-754 floats, like gcc 4.8 on Linux / x86-64). Of course, operator < on double is often providing a total order, but NaN are known to be exceptions (it is well known folklore that x != x is a way of testing if x , declared as double x; is a NaN). The reason I am asking is that I want to have a.g. std::set<double> (actually, a set of JSON-like -or Python like- values) and I would like the set to have some canonical representation (my

Is there any way to see a number in it's 64 bit float IEEE754 representation

阅读更多关于 Is there any way to see a number in it's 64 bit float IEEE754 representation

问题 Javascript stores all numbers as double-precision 64-bit format IEEE 754 values according to the spec: The Number type has exactly 18437736874454810627 (that is, 2 64 −2 53 +3) values, representing the double-precision 64-bit format IEEE 754 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic Is there any way to see the number in this form in Javascript? 回答1: You can use typed arrays to examine the raw bytes of a number. Create a Float64Array with one element, and