ieee-754

Are IEEE floats valid key types for std::map and std::set?

时间秒杀一切 提交于 2019-11-27 04:25:42
问题 Background The requirement for a comparator on the key type of an associative container (for example std::map) is that it imposes a strict weak order on the elements of the key type. For a given comparator comp(x, y) we define equiv(x, y) = !comp(x, y) && !comp(y, x) . The requirements for comp(x, y) being a strict weak order are Irreflexibility ( !comp(x, x) for all x ) Transitivity of the ordering (if comp(a, b) and comp(b, c) then comp(a, c) ). Transitivity of equivalence (if equiv(a, b)

Read/Write bytes of float in JS

梦想的初衷 提交于 2019-11-27 04:23:51
Is there any way I can read bytes of a float value in JS? What I need is to write a raw FLOAT or DOUBLE value into some binary format I need to make, so is there any way to get a byte-by-byte IEEE 754 representation? And same question for writing of course. KooiInc Would this snippet help? @ Kevin Gadd : var parser = new BinaryParser ,forty = parser.encodeFloat(40.0,2,8) ,twenty = parser.encodeFloat(20.0,2,8); console.log(parser.decodeFloat(forty,2,8).toFixed(1)); //=> 40.0 console.log(parser.decodeFloat(twenty,2,8).toFixed(1)); //=> 20.0 You can do it with typed arrays : var buffer = new

How computer does floating point arithmetic?

柔情痞子 提交于 2019-11-27 04:21:47
I have seen long articles explaining how floating point numbers can be stored and how the arithmetic of those numbers is being done, but please briefly explain why when I write cout << 1.0 / 3.0 <<endl; I see 0.333333 , but when I write cout << 1.0 / 3.0 + 1.0 / 3.0 + 1.0 / 3.0 << endl; I see 1 . How does the computer do this? Please explain just this simple example. It is enough for me. The problem is that the floating point format represents fractions in base 2. The first fraction bit is ½, the second ¼, and it goes on as 1 / 2 n . And the problem with that is that not every rational number

Status of __STDC_IEC_559__ with modern C compilers

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-27 04:05:47
问题 C99 added a macro __STDC_IEC_559__ which can be used to test if a compiler and standard library conform to the ISO/IEC/IEEE 60559 (or IEEE 754) standard. According to the answers for this question how-to-check-that-ieee-754-single-precision-32-bit-floating-point-representation most C compilers don't set the preprocessor macro __STDC_IEC_559__ . According to GCC's documentation it does not define __STDC_IEC_559__ . I tested this with GCC 4.9.2 and Clang 3.6.0 both using with glibc 2.21 using

Ranges of floating point datatype in C?

随声附和 提交于 2019-11-27 04:02:39
I am reading a C book, talking about ranges of floating point, the author gave the table: Type Smallest Positive Value Largest value Precision ==== ======================= ============= ========= float 1.17549 x 10^-38 3.40282 x 10^38 6 digits double 2.22507 x 10^-308 1.79769 x 10^308 15 digits I dont know where the numbers in the columns Smallest Positive and Largest Value come from. These numbers come from the IEEE-754 standard, which defines the standard representation of floating point numbers. Wikipedia article at the link explains how to arrive at these ranges knowing the number of bits

IEEE-754 standard on NVIDIA GPU (sm_13)

青春壹個敷衍的年華 提交于 2019-11-27 03:40:17
问题 If I perform a float (single precision) operation on a Host and a Device (GPU arch sm_13) , then will the values be different ? 回答1: A good discussion of this is availble in a whitepaper from NVIDIA. Basically: IEEE-754 is implemented by almost everything currently; Even between faithful implementation of this standard, you can still see differences in results (famously, Intel's doing 80-bit internally for double precision), or high optimization settings with your compiler can change results

Math.pow with negative numbers and non-integer powers

醉酒当歌 提交于 2019-11-27 03:29:26
问题 The ECMAScript specification for Math.pow has the following peculiar rule: If x < 0 and x is finite and y is finite and y is not an integer, the result is NaN. (http://es5.github.com/#x15.8.2.13) As a result Math.pow(-8, 1 / 3) gives NaN rather than -2 What is the reason for this rule? Is there some sort of broader computer science or IEEEish reason for this rule, or is it just a choice TC39/Eich made once upon a time? Update Thanks to Amadan's exchanges with me, I think I understand the

Behaviour of negative zero (-0.0) in comparison with positive zero (+0.0)

ε祈祈猫儿з 提交于 2019-11-27 03:22:47
问题 In my code, float f = -0.0; // Negative and compared with negative zero f == -0.0f result will be true . But float f = 0.0; // Positive and compared with negative zero f == -0.0f also, result will be true instead of false Why in both cases result to be true? Here is a MCVE to test it (live on coliru): #include <iostream> int main() { float f = -0.0; std::cout<<"==== > " << f <<std::endl<<std::endl; if(f == -0.0f) { std::cout<<"true"<<std::endl; } else { std::cout<<"false"<<std::endl; } }

Double vs float on the iPhone

女生的网名这么多〃 提交于 2019-11-27 02:58:13
I have just heard that the iphone cannot do double natively thereby making them much slower that regular float. Is this true? Evidence? I am very interested in the issue because my program needs high precision calculations, and I will have to compromise on speed. The iPhone can do both single and double precision arithmetic in hardware. On the 1176 (original iPhone and iPhone3G), they operate at approximately the same speed, though you can fit more single-precision data in the caches. On the Cortex-A8 (iPhone3GS, iPhone4 and iPad), single-precision arithmetic is done on the NEON unit instead

how IEEE-754 floating point numbers work

别来无恙 提交于 2019-11-27 02:55:41
问题 Let's say I have this: float i = 1.5 in binary, this float is represented as: 0 01111111 10000000000000000000000 I broke up the binary to represent the 'signed', 'exponent' and 'fraction' chunks. What I don't understand is how this represents 1.5. The exponent is 0 once you subtract the bias (127 - 127), and the fraction part with the implicit leading one is 1.1. How does 1.1 scaled by nothing = 1.5??? 回答1: Think first in terms of decimal (base 10): 643.72 is: (6 * 10 2 ) + (4 * 10 1 ) + (3 *