ieee-754 | 易学教程

How to get the IEEE 754 binary representation of a float in C#

阅读更多关于 How to get the IEEE 754 binary representation of a float in C#

问题 I have some single and double precision floats that I want to write to and read from a byte[]. Is there anything in .Net I can use to convert them to and from their 32 and 64 bit IEEE 754 representations? 回答1: .NET Single and Double are already in IEEE-754 format. You can use BitConverter.ToSingle() and ToDouble() to convert byte[] to floating point, GetBytes() to go the other way around. 回答2: Update for current .NET/C# using spans: static void Main() { Span<byte> data = stackalloc byte[20];

double arithmetic and equality in Java

阅读更多关于 double arithmetic and equality in Java

问题 Here's an oddity (to me, at least). This routine prints true: double x = 11.0; double y = 10.0; if (x-y == 1.0) { // print true } else { // print false } But this routine prints false: double x = 1.1; double y = 1.0; if (x-y == 0.1) { // print true } else { // print false } Anyone care to explain what's going on here? I'm guessing it has something to do with integer arithmetic for int s posing as float s. Also, are there other bases (other than 10 ) that have this property? 回答1: 1.0 has an

Convert from IBM floating point to IEEE floating point standard and Vice Versa- In C#?

阅读更多关于 Convert from IBM floating point to IEEE floating point standard and Vice Versa- In C#?

问题 Was looking for a way to IEEE floating point numbers to IBM floating point format for a old system we are using. Is there a general formula we can use in C# to this end. 回答1: // http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture // float2ibm(-118.625F) == 0xC276A000 // 1 100 0010 0111 0110 1010 0000 0000 0000 // IBM/370 single precision, 4 bytes // xxxx.xxxx xxxx.xxxx xxxx.xxxx xxxx.xxxx // s|-exp--| |--------fraction-----------| // (7) (24) // value = (-1)**s * 16**(e - 64) * .f

Turn float into string

阅读更多关于 Turn float into string

问题 I have proceeded to state when I need to turn IEEE-754 single and double precision numbers into strings with base 10 . There is FXTRACT instruction available, but it provides only exponent and mantissa for base 2, as the number calculation formula is: value = (-1)^sign * 1.(mantissa) * 2^(exponent-bias) If I had some logarithmic instructions for specific bases, I would be able to change base of 2 exponent - bias part in expression, but currently I don't know what to do. I was also thinking of

Why does IEEE 754 reserve so many NaN values?

阅读更多关于 Why does IEEE 754 reserve so many NaN values?

问题 It seems that the IEEE 754 standard defines 16,777,214 32-bit floating point values as NaNs, or 0.4% of all possible values. I wonder what is the rationale for reserving so many useful values, while only 2 ones essentially needed: one for signaling and one for quiet NaN. Sorry if this question is trivial, I couldn't find any explanation on the internet. 回答1: The IEEE-754 standard defines a NaN as a number with all ones in the exponent, and a non-zero significand. The highest-order bit in the

Extracting the exponent and mantissa of a Javascript Number

阅读更多关于 Extracting the exponent and mantissa of a Javascript Number

问题 Is there a reasonably fast way to extract the exponent and mantissa from a Number in Javascript? AFAIK there's no way to get at the bits behind a Number in Javascript, which makes it seem to me that I'm looking at a factorization problem: finding m and n such that 2^n * m = k for a given k . Since integer factorization is in NP, I can only assume that this would be a fairly hard problem. I'm implementing a GHC plugin for generating Javascript and need to implement the decodeFloat_Int# and

Extreme numerical values in floating-point precision in R

阅读更多关于 Extreme numerical values in floating-point precision in R

问题 Can somebody please explain me the following output. I know that it has something to do with floating point precision, but the order of magnitue (difference 1e308) surprises me. 0: high precision > 1e-324==0 [1] TRUE > 1e-323==0 [1] FALSE 1: very unprecise > 1 - 1e-16 == 1 [1] FALSE > 1 - 1e-17 == 1 [1] TRUE 回答1: R uses IEEE 754 double-precision floating-point numbers. Floating-point numbers are more dense near zero. This is a result of their being designed to compute accurately (the

Extreme numerical values in floating-point precision in R

阅读更多关于 Extreme numerical values in floating-point precision in R

Representing integers in doubles

阅读更多关于 Representing integers in doubles

问题 Can a double (of a given number of bytes, with a reasonable mantissa/exponent balance) always fully precisely hold the range of an unsigned integer of half that number of bytes? E.g. can an eight byte double fully precisely hold the range of numbers of a four byte unsigned int? What this will boil down to is if a two byte float can hold the range of a one byte unsigned int. A one byte unsigned int will of course be 0 -> 255. 回答1: An IEEE754 64-bit double can represent any 32-bit integer,

Do any real-world CPUs not use IEEE 754?

阅读更多关于 Do any real-world CPUs not use IEEE 754?

问题 I'm optimizing a sorting function for a numerics/statistics library based on the assumption that, after filtering out any NaNs and doing a little bit twiddling, floats can be compared as 32-bit ints without changing the result and doubles can be compared as 64-bit ints. This seems to speed up sorting these arrays by somewhere on the order of 40%, and my assumption holds as long as the bit-level representation of floating point numbers is IEEE 754. Are there any real-world CPUs that people