ieee-754

Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

我的未来我决定 提交于 2019-11-28 00:16:48
I would like to know if any code in C or C++ using floating point arithmetic would produce bit exact results in any x86 based architecture, regardless of the complexity of the code. To my knowledge, any x86 architecture since the Intel 8087 uses a FPU unit prepared to handle IEEE-754 floating point numbers, and I cannot see any reason why the result would be different in different architectures. However, if they were different (namely due to different compiler or different optimization level), would there be some way to produce bit-exact results by just configuring the compiler? Table of

Portable serialisation of IEEE754 floating-point values

北城以北 提交于 2019-11-27 23:36:31
问题 I've recently been working on a system that needs to store and load large quantities of data, including single-precision floating-point values. I decided to standardise on network byte order for integers, and also decided to store floating point values in big-endian format, i.e.: |-- Byte 0 --| |-- Byte 1 -| Byte 2 Byte 3 # ####### # ####### ######## ######## Sign Exponent Mantissa 1b 8b, MSB first 23b, MSB first Ideally, I want to provide functions like htonl() and ntohl() , since I have

Floating Point to Binary Value(C++)

♀尐吖头ヾ 提交于 2019-11-27 21:24:37
I want to take a floating point number in C++, like 2.25125, and a int array filled with the binary value that is used to store the float in memory (IEEE 754). So I could take a number, and end up with a int num[16] array with the binary value of the float: num[0] would be 1 num[1] would be 1 num[2] would be 0 num[3] would be 1 and so on... Putting an int into an array isn't difficult, just the process of getting the binary value of a float is where I'm stuck. Can you just read the binary in the memory that the float variable? If not, how could I go about doing this in C++? EDIT: The reason

Convert ieee 754 float to hex with c - printf

让人想犯罪 __ 提交于 2019-11-27 20:58:27
问题 Ideally the following code would take a float in IEEE 754 representation and convert it into hexadecimal void convert() //gets the float input from user and turns it into hexadecimal { float f; printf("Enter float: "); scanf("%f", &f); printf("hex is %x", f); } I'm not too sure what's going wrong. It's converting the number into a hexadecimal number, but a very wrong one. 123.1443 gives 40000000 43.3 gives 60000000 8 gives 0 so it's doing something, I'm just not too sure what. Help would be

Converting Int to Float or Float to Int using Bitwise operations (software floating point)

谁说胖子不能爱 提交于 2019-11-27 20:48:57
I was wondering if you could help explain the process on converting an integer to float, or a float to an integer. For my class, we are to do this using only bitwise operators, but I think a firm understanding on the casting from type to type will help me more in this stage. From what I know so far, for int to float, you will have to convert the integer into binary, normalize the value of the integer by finding the significand, exponent, and fraction, and then output the value in float from there? As for float to int, you will have to separate the value into the significand, exponent, and

Is it safe to assume floating point is represented using IEEE754 floats in C?

别说谁变了你拦得住时间么 提交于 2019-11-27 20:31:18
Floating point is implementation defined in the C. So there isn't any guarantees. Our code needs to be portable, we are discussing whether or not acceptable to use IEEE754 floats in our protocol. For performance reasons it would be nice if we don't have to convert back and forth between a fixed point format when sending or receiving data. While I know that there can be differences between platforms and architectures regarding the size of long or wchar_t . But I can't seem to find any specific about the float and double . What I found so far that the byte order maybe reversed on big endian

Difference between Java's `Double.MIN_NORMAL` and `Double.MIN_VALUE`?

拈花ヽ惹草 提交于 2019-11-27 20:22:12
What's the difference between Double.MIN_NORMAL (introduced in Java 1.6) and Double.MIN_VALUE ? The answer can be found in the IEEE specification of floating point representation : For the single format, the difference between a normal number and a subnormal number is that the leading bit of the significand (the bit to left of the binary point) of a normal number is 1, whereas the leading bit of the significand of a subnormal number is 0. Single-format subnormal numbers were called single-format denormalized numbers in IEEE Standard 754. In other words, Double.MIN_NORMAL is the smallest

Extracting the exponent and mantissa of a Javascript Number

放肆的年华 提交于 2019-11-27 20:12:44
Is there a reasonably fast way to extract the exponent and mantissa from a Number in Javascript? AFAIK there's no way to get at the bits behind a Number in Javascript, which makes it seem to me that I'm looking at a factorization problem: finding m and n such that 2^n * m = k for a given k . Since integer factorization is in NP, I can only assume that this would be a fairly hard problem. I'm implementing a GHC plugin for generating Javascript and need to implement the decodeFloat_Int# and decodeDouble_2Int# primitive operations ; I guess I could just rewrite the parts of the base library that

Are IEEE floats valid key types for std::map and std::set?

邮差的信 提交于 2019-11-27 20:07:19
Background The requirement for a comparator on the key type of an associative container (for example std::map) is that it imposes a strict weak order on the elements of the key type. For a given comparator comp(x, y) we define equiv(x, y) = !comp(x, y) && !comp(y, x) . The requirements for comp(x, y) being a strict weak order are Irreflexibility ( !comp(x, x) for all x ) Transitivity of the ordering (if comp(a, b) and comp(b, c) then comp(a, c) ). Transitivity of equivalence (if equiv(a, b) and equiv(b, c) then equiv(a, c) ) std::less<float> (the default comparator) uses operator< , which does

Are the bit patterns of NaNs really hardware-dependent?

偶尔善良 提交于 2019-11-27 19:34:52
I was reading about floating-point NaN values in the Java Language Specification (I'm boring). A 32-bit float has this bit format: seee eeee emmm mmmm mmmm mmmm mmmm mmmm s is the sign bit, e are the exponent bits, and m are the mantissa bits. A NaN value is encoded as an exponent of all 1s, and the mantissa bits are not all 0 (which would be +/- infinity). This means that there are lots of different possible NaN values (having different s and m bit values). On this, JLS §4.2.3 says: IEEE 754 allows multiple distinct NaN values for each of its single and double floating-point formats. While