I have a rather gnarly bit of code that must more-or-less randomly generate a bunch of percentages, stored as decimal floats. That is, it decides that material one makes up
Since floats are stored in the machine in a binary representation, there are always numbers that can not be precisely represented. If you need to work around this limitation, you must use some math library, that uses custom defined datatypes.
From your code it looks like you're randomly generating planet atmospheres, presumably for some kind of game or something. At any rate, the randomness of it is convincing me it doesn't need to be too accurate.
So i'd suggest you don't use floats, just use int
s and go up to 100. Then you'll get your exact summing. For any maths you want to use them in just cast.
Is this not an option?
If you insist on using floats, then read on...
The problem you have using floats is as follows:
A floating point (in this case a double) is represented like this:
which corresponds to a double
of value:
So,
your number is (1+M) * 2**(E)
(where E
= e-offset
)
1+M
is always in the range 1-2.
So, we have equally spaced numbers inbetween each pair of power of two (positive and negative), and the spacing between the numbers doubles with each increase in the exponent, E
.
Think about this, it means that there is a constant spacing of representable numbers between each of these numbers [(1,2),(2,4),(4,8), etc]
. This also applies to the negative powers of two, so:
0.5 - 1
0.25 - 0.5
0.125 - 0.25
0.0625 - 0.125
etc.
And in each range, there are the same quantity of numbers. This means that if you take a number in the range (0.25,0.5)
and add it to a number in the range (0.5,1)
, then you have a 50% chance that the number cannot be exactly represented.
If you sum two floating point numbers for which the exponents differ by D
, then the chances of the sum being exactly representable are 2-D.
If you then want to represent the range 0-1
, then you'll have to be very careful about which floats you use (i.e. force the last N
bits of the fraction to be zero, where N
is a function of E
).
If you go down this route, then you'll end up with far more floats at the top of the range than the bottom.
The alternative is to decide how close to zero you want to be able to get. Lets say you want to get down to 0.0001.
0.0001 = (1+M) * 2E
log2(0.0001) = -13.28771...
So we'll use -14 as our minimum exponent.
And then to get up to 1, we just leave the exponent as -1.
So now we have 13 ranges, each with twice as many values as the lower one which we can sum without having to worry about precision.
This also means though, that the top range has 213 more values we can use. This obviously isn't okay.
So, after picking a float, then round it to the nearest allowable value - in this case, by round I just mean set the last 13 bits to zero, and just put it all in a function, and apply it to your numbers immediately after you get them out of rand
.
Something like this:
from ctypes import *
def roundf(x,bitsToRound):
i = cast(pointer(c_float(x)), POINTER(c_int32)).contents.value
bits = bin(i)
bits = bits[:-bitsToRound] + "0"*bitsToRound
i = int(bits,2)
y = cast(pointer(c_int32(i)), POINTER(c_float)).contents.value
return y
(images from wikipedia)
floats are are represented by powers of two. From the python docs: "Unfortunately, most decimal fractions cannot be represented exactly as binary fractions"
http://docs.python.org/2/tutorial/floatingpoint.html
EDIT: Maybe instead of actually trying to get to 1.0000000000000000000000 you should determine an acceptable level of error by cutting off anything after the third decimal place. You can be relatively certain that the value added to 1. Using this concept you could accept any answer greater than 0.999 and less than 1.001.
This may not be perfect but it might be a good workaround to get you past your problem.
In the end, it turned out the simplest solution was to change the problem. Rounding the sum to 5 digits of precision with round(x,5)
whenever it was checked gave adequate results.
I understand that you want to pick two floating-point numbers between 0.0 and 1.0 such that they add to 1.0.
Do this:
Then in floating-point, S + L is exactly 1.0.
If for some reason you obtain the smallest number S first in your algorithm, compute L = 1.0 - S and then S0 = 1.0 - L. Then L and S0 add up exactly to 1.0. Consider S0 the “rounded” version of S.
Here is an alternative solution if you are adding N numbers, each between 0.0 and 1.0, and expect the operations X1 + X2 + … and 1.0 - X1 … to behave like they do in math.
Each time you obtain a new number Xi, do: Xi ← 1.0 - (1.0 - Xi). Only use this new value of Xi from that point onwards. This assignment will slightly round Xi so that it behaves well in all sums whose intermediate results are between 0.0 and 1.0.
EDIT: after doing the above for values X1, …, XN-1, compute XN as 1 - X1 - … - XN-1. This floating-point computation will be exact (despite involving floating-point), so that you will have X1 + … + XN = 1 exactly.