floating-point-precision

Adding 32 bit floating point numbers.

帅比萌擦擦* 提交于 2019-12-01 06:39:07
问题 I'm learning more then I ever wanted to know about Floating point numbers. Lets say I needed to add: 1 10000000 00000000000000000000000 1 01111000 11111000000000000000000 2’s complement form. The first bit is the sign, the next 8 bits are the exponent and the last 23 bits are the mantisa. Without doing a conversion to scientific notation, how do I add these two numbers? Can you walk through it step by step? any good resources for this stuff? Videos and practice examples would be great. 回答1:

Switching between float and double precision at compile time

六月ゝ 毕业季﹏ 提交于 2019-12-01 05:47:24
问题 Where should I look at if I want to switch between float and double precision at compile time. Its like, if user wants everything in float instead of double precision how I can maintain this flexibility? In other words, how should I define a variable that could be either float or double precision conditionally? 回答1: If it is OK to make the switch at compile time, a simple typedef would do: #ifdef USE_DOUBLES typedef double user_data_t; #else typedef float user_data_t; #endif Use user_data_t

Exact binary representation of a double [duplicate]

大兔子大兔子 提交于 2019-12-01 05:33:47
Possible Duplicate: Float to binary in C++ I have a very small double var, and when I print it I get -0. (using C++). Now in order to get better precision I tried using cout.precision(18); \\i think 18 is the max precision i can get. cout.setf(ios::fixed,ios::floatfield); cout<<var;\\var is a double. but it just writes -0.00000000000... I want to see the exact binary representation of the var. In other words I want to see what binary number is written in the stack memory/register for this var. union myUnion { double dValue; uint64_t iValue; }; myUnion myValue; myValue.dValue=123.456; cout <<

How does floating point error propagate when doing mathematical operations in C++?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-01 04:37:37
Let's say that we have declared the following variables float a = 1.2291; float b = 3.99; float variables have precision 6, which (if I understand correctly) means that the difference between the number that the computer actually stores and the actual number that you want will be less than 10^-6 that means that both a and b have some error that is less than 10^-6 so inside the computer a could actually be 1.229100000012123 and b could be 3.9900000191919 now let's say that you have the following code float c = 0; for(int i = 0; i < 1000; i++) c += a + b; my question is, will c 's final result

How does floating point error propagate when doing mathematical operations in C++?

馋奶兔 提交于 2019-12-01 02:05:58
问题 Let's say that we have declared the following variables float a = 1.2291; float b = 3.99; float variables have precision 6, which (if I understand correctly) means that the difference between the number that the computer actually stores and the actual number that you want will be less than 10^-6 that means that both a and b have some error that is less than 10^-6 so inside the computer a could actually be 1.229100000012123 and b could be 3.9900000191919 now let's say that you have the

Understanding floating point precision

佐手、 提交于 2019-11-30 23:46:40
Is it the case that: Representable floating point values are densest in the real number line near zero? Representable floating point values grow sparser (exponentially?) as the number line moves away from zero? If the above two are true, does that mean there is less precision farther from zero? Overall question: Does precision in some way refer to or depend on the density of numbers you can represent (accurately)? The term precision usually refers to the number of significant digits (bits) in the represented value. So precision varies with the number of bits (or digits) in the mantissa of

“GL_HALF_FLOAT” with OpenGL Rendering and GLSL

孤者浪人 提交于 2019-11-30 19:38:16
问题 I am programming an OpenGL renderer in C++. I want it to be as efficient as possible and each vertex/normal/UV tex coord/tangents/etc to take up as little memory as possible. I am using indexes, line strips, and fans. I was thinking that 32bit floating points are not necessary and 16 bit Floating points should be fine, at least for some of these like normals and UVs. I can't seem to find any examples of this anywhere. I can find talk of "GL_HALF_FLOAT", but no real examples. Am I on the right

Understanding floating point precision

拈花ヽ惹草 提交于 2019-11-30 19:31:38
问题 Is it the case that: Representable floating point values are densest in the real number line near zero? Representable floating point values grow sparser (exponentially?) as the number line moves away from zero? If the above two are true, does that mean there is less precision farther from zero? Overall question: Does precision in some way refer to or depend on the density of numbers you can represent (accurately)? 回答1: The term precision usually refers to the number of significant digits

How to specify floating point decimal precision from variable?

本秂侑毒 提交于 2019-11-30 18:32:39
I have the following repetitive simple code repeated several times that I would like to make a function for: for i in range(10): id = "some id string looked up in dict" val = 63.4568900932840928 # some floating point number in dict corresponding to "id" tabStr += '%-15s = %6.1f\n' % (id,val) I want to be able to call this function: def printStr(precision) Where it preforms the code above and returns tabStr with val to precision decimal points. For example: printStr(3) would return 63.457 for val in tabStr . Any ideas how to accomplish this kind of functionality? tabStr += '%-15s = %6.*f\n' %

Can I specify a numpy dtype when generating random values?

馋奶兔 提交于 2019-11-30 17:13:28
I'm creating a numpy array of random values and adding them to an existing array containing 32-bit floats. I'd like to generate the random values using the same dtype as the target array, so that I don't have to convert the dtypes manually. Currently I do this: import numpy as np x = np.zeros((10, 10), dtype='f') x += np.random.randn(*x.shape).astype('f') What I'd like to do instead of the last line is something like: x += np.random.randn(*x.shape, dtype=x.dtype) but randn (and actually none of the numpy.random methods) does not accept a dtype argument. My specific question is, is it possible