ieee-754

Representable result of floor() and ceil()

我是研究僧i 提交于 2019-12-01 17:35:48
For an arbitrary value 'v' of a floating point type (float/double/long double), does C89 guarantee that the mathematically exact integer result of floor(v) and ceil(v) is a representable value of the type of 'v'? Does any of the later C or C++ standards guarantee this? Does IEEE 754 guarantee this? This is guaranteed by the construction of IEEE-754 numbers. (To be clear: C does not guarantee IEEE-754, but the following analysis holds for all other floating-point formats with which I am familiar as well; the crucial property is that all sufficiently large numbers in the format are integers).

Can the floating-point status flag FE_UNDERFLOW set when the result is not sub-normal?

南楼画角 提交于 2019-12-01 17:29:33
问题 While investigating floating-point exception status flags, I came across the curious case of a status flag FE_UNDERFLOW set when not expected. This is similar to When does underflow occur? yet goes into a corner case that may be a C specification issue or FP hardware defect. // pseudo code // s bias_expo implied "mantissa" w = smallest_normal; // 0 000...001 (1) 000...000 x = w * 2; // 0 000...010 (1) 000...000 y = next_smaller(x); // 0 000...001 (1) 111...111 round_mode(FE_TONEAREST); clear

Is there any way to see a number in it's 64 bit float IEEE754 representation

拟墨画扇 提交于 2019-12-01 17:26:21
Javascript stores all numbers as double-precision 64-bit format IEEE 754 values according to the spec : The Number type has exactly 18437736874454810627 (that is, 2 64 −2 53 +3) values, representing the double-precision 64-bit format IEEE 754 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic Is there any way to see the number in this form in Javascript? You can use typed arrays to examine the raw bytes of a number. Create a Float64Array with one element, and then create a Uint8Array with the same buffer. You can then set the first element of the float array to your

Representable result of floor() and ceil()

一世执手 提交于 2019-12-01 16:23:51
问题 For an arbitrary value 'v' of a floating point type (float/double/long double), does C89 guarantee that the mathematically exact integer result of floor(v) and ceil(v) is a representable value of the type of 'v'? Does any of the later C or C++ standards guarantee this? Does IEEE 754 guarantee this? 回答1: This is guaranteed by the construction of IEEE-754 numbers. (To be clear: C does not guarantee IEEE-754, but the following analysis holds for all other floating-point formats with which I am

What is overflow and underflow in floating point

孤街浪徒 提交于 2019-12-01 12:17:52
I feel I don't really understand the concept of overflow and underflow . I'm asking this question to clarify this. I need to understand it at its most basic level with bits. Let's work with the simplified floating point representation of 1 byte - 1 bit sign, 3 bits exponent and 4 bits mantissa: 0 000 0000 The max exponent we can store is 111_2=7 minus the bias K=2^2-1=3 which gives 4 , and it's reserved for Infinity and NaN . The exponent for max number is 3 , which is 110 under offset binary. So the bit pattern for max number is: 0 110 1111 // positive 1 110 1111 // negative When the exponent

Is it possible to make isnan() work in gfortran -O3 -ffast-math?

こ雲淡風輕ζ 提交于 2019-12-01 11:35:16
I would like to compile a program with gfortran and -O3 -ffast-math enabled, since it gives a nice performance boost. I was rather confused, that gfortran's isnan() catched some NaN's but not all of them. After reading Checking if a double (or float) is NaN in C++ how do I make a portable isnan/isinf function Negative NaN is not a NaN? I am under the impression that people are able to check for NaN's in C via bit-fiddling even with fast-math enabled. However, this puzzles me since fast-math can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules

Convert IEEE float hex to decimal?

拟墨画扇 提交于 2019-12-01 11:31:25
IF I have a IEEE float hex 42F6E979, how do I convert it to decimal? I believe the decimal representation is = 123.456001 (Most) assembly language doesn't really enforce types very strongly, so you can just initialize a location with that value, and then treat/use it as a float. The easiest way to convert is usually something like: .data myfloat dd 042F6E979H mydec db 10 dup(?) .code mov ebx, offset mydec fld myfloat fbstp [ebx] This actually produces binary coded decimal, so you have to split each byte into two digits for display. Of course, this is for x86 -- most other architectures make

Easiest way to convert a decimal float to bit representation manually based on IEEE 754, without using any library

99封情书 提交于 2019-12-01 10:32:22
I know there are number ways to read every bit of a IEEE 754 float using written libraries. I don't want that, and I want to be able to manually convert a decimal float to binary representation based on IEEE 754. I understand how IEEE 754 works and I am just trying to apply it. I ask this question here just want to see whether my way is normal or stupid and I am also wondering how PC does it quickly. If I am given a decimal float in a string , I need to figure out what the E is and what the M is. get the two parts out: integer part i and fraction part f . deal with f . I constantly multiple 2

Is it possible to make isnan() work in gfortran -O3 -ffast-math?

柔情痞子 提交于 2019-12-01 08:04:49
问题 I would like to compile a program with gfortran and -O3 -ffast-math enabled, since it gives a nice performance boost. I was rather confused, that gfortran's isnan() catched some NaN's but not all of them. After reading Checking if a double (or float) is NaN in C++ how do I make a portable isnan/isinf function Negative NaN is not a NaN? I am under the impression that people are able to check for NaN's in C via bit-fiddling even with fast-math enabled. However, this puzzles me since fast-math

Large numbers erroneously rounded in JavaScript

[亡魂溺海] 提交于 2019-12-01 07:26:33
问题 See this code: <html> <head> <script src="http://www.json.org/json2.js" type="text/javascript"></script> <script type="text/javascript"> var jsonString = '{"id":714341252076979033,"type":"FUZZY"}'; var jsonParsed = JSON.parse(jsonString); console.log(jsonString, jsonParsed); </script> </head> <body> </body> </html> When I see my console in Firefox 3.5, the value of jsonParsed is: Object id=714341252076979100 type=FUZZY I.e the number is rounded. Tried different values, the same outcome