ieee-754

IEEE float hex 424ce027 to float?

谁说我不能喝 提交于 2019-12-12 09:23:16
问题 If I have a IEEE float hex 424ce027, how do I convert it to decimal? unsigned char ptr[] = {0x42,0x4c,0xe0,0x27}; how do ? float tmp = 51.218899; 回答1: Perhaps... float f = *reinterpret_cast<float*>(ptr); Although on my x86 machine here I had to also reverse the byte order of the character to get the value you wanted. std::reverse(ptr, ptr + 4); float f = *reinterpret_cast<float*>(ptr); You might want to use sizeof(float) instead of 4 or some other way to get the size. You might want to

Why the IEEE-754 exponent bias used in this C code is 126.94269504 instead of 127?

大憨熊 提交于 2019-12-12 07:44:49
问题 The following C function is from fastapprox project. static inline float fasterlog2 (float x) { union { float f; uint32_t i; } vx = { x }; float y = vx.i; y *= 1.1920928955078125e-7f; return y - 126.94269504f; } Could some experts here explain why the exponent bias used in the above code is 126.94269504 instead of 127? Is it more accurate bias value? 回答1: In the project you linked, they included a Mathematica notebook with an explanation of their algorithms, which includes the "mysterious"

Javascript: How to convert signed Char Array to Float (maybe using IEEE754)?

杀马特。学长 韩版系。学妹 提交于 2019-12-12 04:17:46
问题 I struggle with the result of my nodejs-JDBC-MSSQL-Binary-ResultValue. From my database I've got this [-78,119,99,63] // this is an array of signed Chars In hex, 0xB2, 0x77, 0x63, 0x3F in big endian and 0x3F6377B2 as little endian. After conversion it has to be this: 0.8885451555252075 But how to do convert this by using javascript or nodejs? Kind regards Markus 回答1: You can use "typed arrays": var chars = new Uint8Array([-78, 119, 99, 63]) var floats = new Float32Array(chars.buffer) > [0

IEEE754 to floating point C#

僤鯓⒐⒋嵵緔 提交于 2019-12-11 19:34:21
问题 The below code is simple converting a 32bit-integer from the object being passed to the function, the 32-bit integer represents a floating number. I have checked with an online calculator that that i am getting the sign, exponent and mantessa the correct way but strangely i am getting the answer wrong. Can anyone please check if i am mathematically (or maybe programmatically) doing it wrong somehow!? Regards public double FromFloatSafe(object f) { uint fb = Convert.ToUInt32(f); uint sign,

Why `0.4/2` equals to `0.2` meanwhile `0.6/3` equals to `0.19999999999999998` in python? [duplicate]

烂漫一生 提交于 2019-12-11 16:52:42
问题 This question already has answers here : Is floating point math broken? (31 answers) Closed 4 years ago . I know these are float point division. But why did these two formula behave differently? And I did some more investigation, the result confusing me even more: >>>0.9/3 0.3 >>>1.2/3 0.39999999999999997 >>>1.5/3 0.5 What's the logic here to decide whether the result is printed with one decimal place or more? PS: I used python3.4 to do the experiment above. 回答1: Because the exact values of

Understanding the usefulness of denormalized floating point numbers

北城余情 提交于 2019-12-11 14:54:08
问题 Reading Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic, I found something I don't really understand. He states that having denormalized numbers is good, because x = y if and only if x-y==0 . Then he gives the example: if (x != y) then z = 1/(x-y) Now, suppose that x-y is a denormalized number. Then there is a high chance that 1/(x-y) will become inf . Which is the same result, if we didn't have denormalized numbers in the first place. Even, if I want to

Fixed Point to Floating Point and Backwards

柔情痞子 提交于 2019-12-11 14:46:02
问题 Is converting Fixed Pt. (fixed n bit for fraction) to IEEE double safe ? ie: does IEEE double format can represent all numbers a fixed point can represent ? The test: a number goes to floating pt format then back to it's original fixed pt format. 回答1: Assuming your fixed point numbers are stored as 32-bit integers, yes, IEEE double precision can represent any value representable in fixed point. This is because double has a 53-bit mantissa, your fixed point values only have 32 bits of

Floating point arithmetic varies between g++ and clang++?

て烟熏妆下的殇ゞ 提交于 2019-12-11 13:40:37
问题 I have come across a bug that seems to be platform dependent. I am getting different results for clang++ and g++ however only on my 32-Debian Machine. I was always under the impression that IEEE 754 was standardized and that all compilers that abide by the standard would have the same behavior. Please let me know if I am wrong, I am just very confused about this. Also, I realize that depending on floating point comparison is generally not a good idea. #define DEBUG(line) std::cout <<"\t\t" <<

Half precision conversion

笑着哭i 提交于 2019-12-11 12:04:17
问题 How comes that 0 11110 1111111111 is equal to the half precision format 1.1111111111 * 2^15 ? Both should be 65504. The sign bit here is a 0. The exponent would be 11101 and the fractional part 1111111111. But that doesn't look like 1.1111111111 * 2^15 at all. Can someone explain that to me? 回答1: Here is the layout of your half-precision number: The exponent's value is 11110 2 , which is 30 10 . Half-precision numbers have exponent bias of 15 10 , so we need to subtract 15 10 from 30 10 to

sign bit of a NAN in IEEE 754 standard

与世无争的帅哥 提交于 2019-12-11 11:28:49
问题 I want to perform a floating point single precision addition operation in which A = + infinity ( 7F800000) B = - infinity ( FF800000) Will the result(A+B) be +NAN or -NAN ? another related question : We get qNAN if NAN propagates through the arithmetic operation. Whereas sNAN represents an invalid exception operation. So, the above operation will result into a sNAN. Is my understanding correct ? 回答1: The IEEE 754 standard does not specify which representation of NaN you get when you apply an