ieee-754

Increase a double to the next closest value?

假如想象 提交于 2019-12-22 08:53:28
问题 This isn't a question for a real-life project; I'm only curious. We can increase an int using the increment operator ( i++ ). You can define this operation as: This increases the variable with the closest value to i . Which is in this case simply +1. But I was thinking of defining the number of double values available in a specific range according the IEEE 754-2008 system. I would be able to set up a graph which demonstrates these amounts in some ranges and see how it is decreasing. I guess

is float16 supported in matlab?

给你一囗甜甜゛ 提交于 2019-12-22 08:38:39
问题 Does MATLAB support float16 operations? If so, how to convert a double matrix to float16? I am doing an arithmetic operation on a large matrix where 16-bit floating representation is sufficient for my representation. Representing by a double datatype takes 4 times more memory. 回答1: Is your matrix full? Otherwise, try sparse -- saves a lot of memory if there's lots of zero-valued elements. AFAIK, float16 is not supported. Lowest you can go in float -datatype is with single , which is a 32-bit

Inserting multiple not-a-numbers into a std::unordered_set<double>

时间秒杀一切 提交于 2019-12-22 07:05:20
问题 One of consequences of the IEEE 754 standard is the non-intuitive behavior of std::unordered_set<double> , when not-a-number elements ( NAN s) are inserted. Due to the fact that NAN!=NAN , after the following sequence: #include <iostream> #include <cmath> #include <unordered_set> int main(){ std::unordered_set<double> set; set.insert(NAN); set.insert(NAN); std::cout<<"Number of elements "<<set.size()<<"\n"; //there are 2 elements! } there are two elements in the set (see it live): NAN and NAN

Negative zero literal in golang

£可爱£侵袭症+ 提交于 2019-12-22 04:08:12
问题 IEEE754 supports the negative zero. But this code a := -0.0 fmt.Println(a, 1/a) outputs 0 +Inf where I would have expected -0 -Inf Other languages whose float format is based on IEEE754 let you create negative zero literals Java : float a = -0f; System.out.printf("%f %f", a, 1/a); // outputs "-0,000000 -Infinity" C# : var a = -0d; Console.WriteLine(1/a); // outputs "-Infinity" Javascript : ​var a = -0; console.log(a, 1/a);​ // logs "0 -Infinity" But I couldn't find the equivalent in Go. How

Number of floats between two floats

丶灬走出姿态 提交于 2019-12-22 03:54:28
问题 Say I have two Python floats a and b , is there an easy way to find out how many representable real numbers are between the two in IEEE-754 representation (or whatever representation the machine used is using)? 回答1: I don'tknow what you will be using this for - but, if both floats have the same exponent, it should be possible. As the exponent is kept on the high order bits, loading the float bytes (8 bytes in this case) as an integer and subtracting one from another should give the number you

C IEEE-Floats inf equal inf

梦想的初衷 提交于 2019-12-21 22:22:03
问题 In C, on a implementation with IEEE-754 floats, when I compare two floating point numbers which are NaN, it return 0 or "false". But why do two floating point numbers which both are inf count as equal? This Program prints "equal: ..." (at least under Linux AMD64 with gcc) and in my opinion it should print "different: ...". #include <stdio.h> #include <stdlib.h> int main(void) { volatile double a = 1e200; //use volatile to suppress compiler warnings volatile double b = 3e200; volatile double c

How to alter double by its smallest increment

六眼飞鱼酱① 提交于 2019-12-21 11:11:17
问题 Is something broken or I fail to understand what is happening? static String getRealBinary(double val) { long tmp = Double.doubleToLongBits(val); StringBuilder sb = new StringBuilder(); for (long n = 64; --n > 0; tmp >>= 1) if ((tmp & 1) == 0) sb.insert(0, ('0')); else sb.insert(0, ('1')); sb.insert(0, '[').insert(2, "] [").insert(16, "] [").append(']'); return sb.toString(); } public static void main(String[] argv) { for (int j = 3; --j >= 0;) { double d = j; for (int i = 3; --i >= 0;) { d +

what languages expose IEEE 754 traps to the developer?

谁说胖子不能爱 提交于 2019-12-21 04:05:13
问题 I'd like to play with those traps for educational purpose. A common problem with the default behavior in numerical calculus is that we "miss" the Nan (or +-inf) that appeared in a wrong operation. Default behavior is propagation through the computation, but some operation (like comparisons) break the chain and loose the Nan, and the rest of the treatment continue without acknowledging the singularity in previous steps of the algorithm. Sometimes we have ways to react to this kind of event :

Is there an open-source c/c++ implementation of IEEE-754 operations? [closed]

余生长醉 提交于 2019-12-21 03:33:30
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I am looking for a reference implementation of IEEE-754 operations. Is there such a thing? 回答1: I believe the C libraries SoftFloat and fdlibm are suitable for what you are looking for. Others include Linux (GNU libc, glibc) or *BSD libc's math functions. Finally, CRlibm should also be of interest to you. Ulrich

How many normalized numbers can be represented using IEEE-754 Single Precision?

不羁的心 提交于 2019-12-20 16:41:37
问题 Based on the IEEE-754 Single Precision standard, how can I know how many normalized numbers can be represented if I know the following: 1 bit for the Sign 8 bits for the exponent 23 bits for the mantissa Is there a rule that can be applied to any other system of floating-point? 回答1: You've identified the number of bits for each portion of the representation, so you're already halfway there. There are: 2^1 = 2 possibilities for the sign 2^8 = 256 possibilities for the exponent bits, of which