ieee-754 | 易学教程

IEEE float hex 424ce027 to float?

阅读更多关于 IEEE float hex 424ce027 to float?

问题 If I have a IEEE float hex 424ce027, how do I convert it to decimal? unsigned char ptr[] = {0x42,0x4c,0xe0,0x27}; how do ? float tmp = 51.218899; 回答1: Perhaps... float f = *reinterpret_cast<float*>(ptr); Although on my x86 machine here I had to also reverse the byte order of the character to get the value you wanted. std::reverse(ptr, ptr + 4); float f = *reinterpret_cast<float*>(ptr); You might want to use sizeof(float) instead of 4 or some other way to get the size. You might want to

Why the IEEE-754 exponent bias used in this C code is 126.94269504 instead of 127?

阅读更多关于 Why the IEEE-754 exponent bias used in this C code is 126.94269504 instead of 127?

问题 The following C function is from fastapprox project. static inline float fasterlog2 (float x) { union { float f; uint32_t i; } vx = { x }; float y = vx.i; y *= 1.1920928955078125e-7f; return y - 126.94269504f; } Could some experts here explain why the exponent bias used in the above code is 126.94269504 instead of 127? Is it more accurate bias value? 回答1: In the project you linked, they included a Mathematica notebook with an explanation of their algorithms, which includes the "mysterious"

Javascript: How to convert signed Char Array to Float (maybe using IEEE754)?

阅读更多关于 Javascript: How to convert signed Char Array to Float (maybe using IEEE754)?

问题 I struggle with the result of my nodejs-JDBC-MSSQL-Binary-ResultValue. From my database I've got this [-78,119,99,63] // this is an array of signed Chars In hex, 0xB2, 0x77, 0x63, 0x3F in big endian and 0x3F6377B2 as little endian. After conversion it has to be this: 0.8885451555252075 But how to do convert this by using javascript or nodejs? Kind regards Markus 回答1: You can use "typed arrays": var chars = new Uint8Array([-78, 119, 99, 63]) var floats = new Float32Array(chars.buffer) > [0

IEEE754 to floating point C#

阅读更多关于 IEEE754 to floating point C#

问题 The below code is simple converting a 32bit-integer from the object being passed to the function, the 32-bit integer represents a floating number. I have checked with an online calculator that that i am getting the sign, exponent and mantessa the correct way but strangely i am getting the answer wrong. Can anyone please check if i am mathematically (or maybe programmatically) doing it wrong somehow!? Regards public double FromFloatSafe(object f) { uint fb = Convert.ToUInt32(f); uint sign,

Why `0.4/2` equals to `0.2` meanwhile `0.6/3` equals to `0.19999999999999998` in python? [duplicate]

阅读更多关于 Why `0.4/2` equals to `0.2` meanwhile `0.6/3` equals to `0.19999999999999998` in python? [duplicate]

问题 This question already has answers here : Is floating point math broken? (31 answers) Closed 4 years ago . I know these are float point division. But why did these two formula behave differently? And I did some more investigation, the result confusing me even more: >>>0.9/3 0.3 >>>1.2/3 0.39999999999999997 >>>1.5/3 0.5 What's the logic here to decide whether the result is printed with one decimal place or more? PS: I used python3.4 to do the experiment above. 回答1: Because the exact values of

Understanding the usefulness of denormalized floating point numbers

阅读更多关于 Understanding the usefulness of denormalized floating point numbers

问题 Reading Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic, I found something I don't really understand. He states that having denormalized numbers is good, because x = y if and only if x-y==0 . Then he gives the example: if (x != y) then z = 1/(x-y) Now, suppose that x-y is a denormalized number. Then there is a high chance that 1/(x-y) will become inf . Which is the same result, if we didn't have denormalized numbers in the first place. Even, if I want to

Fixed Point to Floating Point and Backwards

阅读更多关于 Fixed Point to Floating Point and Backwards

问题 Is converting Fixed Pt. (fixed n bit for fraction) to IEEE double safe ? ie: does IEEE double format can represent all numbers a fixed point can represent ? The test: a number goes to floating pt format then back to it's original fixed pt format. 回答1: Assuming your fixed point numbers are stored as 32-bit integers, yes, IEEE double precision can represent any value representable in fixed point. This is because double has a 53-bit mantissa, your fixed point values only have 32 bits of

Floating point arithmetic varies between g++ and clang++?

阅读更多关于 Floating point arithmetic varies between g++ and clang++?

问题 I have come across a bug that seems to be platform dependent. I am getting different results for clang++ and g++ however only on my 32-Debian Machine. I was always under the impression that IEEE 754 was standardized and that all compilers that abide by the standard would have the same behavior. Please let me know if I am wrong, I am just very confused about this. Also, I realize that depending on floating point comparison is generally not a good idea. #define DEBUG(line) std::cout <<"\t\t" <<

Half precision conversion

阅读更多关于 Half precision conversion

问题 How comes that 0 11110 1111111111 is equal to the half precision format 1.1111111111 * 2^15 ? Both should be 65504. The sign bit here is a 0. The exponent would be 11101 and the fractional part 1111111111. But that doesn't look like 1.1111111111 * 2^15 at all. Can someone explain that to me? 回答1: Here is the layout of your half-precision number: The exponent's value is 11110 2 , which is 30 10 . Half-precision numbers have exponent bias of 15 10 , so we need to subtract 15 10 from 30 10 to

sign bit of a NAN in IEEE 754 standard

阅读更多关于 sign bit of a NAN in IEEE 754 standard

问题 I want to perform a floating point single precision addition operation in which A = + infinity ( 7F800000) B = - infinity ( FF800000) Will the result(A+B) be +NAN or -NAN ? another related question : We get qNAN if NAN propagates through the arithmetic operation. Whereas sNAN represents an invalid exception operation. So, the above operation will result into a sNAN. Is my understanding correct ? 回答1: The IEEE 754 standard does not specify which representation of NaN you get when you apply an