Generating random numbers on open-open interval (0,1) efficiently

后端 未结 4 819
有刺的猬
有刺的猬 2021-01-05 21:08

I\'m looking for an efficient way to generate random floating-point numbers on the open-open interval (0,1). I currently have an RNG that generates random integers on the cl

相关标签:
4条回答
  • 2021-01-05 21:36

    I'm looking for an efficient way to generate random floating-point numbers on the open-open interval (0,1). I currently have an RNG that generates random integers on the closed-closed interval of [0, (2^32)-1]. I've already created a half-open floating point RNG on the interval [0,1) by simply multiplying my result from the integer RNG by 1/((2^32)-1)

    This means that your generator 'tries' to produce 2^32 different values. Problem is, float type is 4 bytes long, thus having less than 2^32 distinct defined values overall. To be precise, there can be only 2^23 values on interval [1/2, 1). Depending on what you need it may be a problem or not.

    You may want to use lagged Fibonacci generator (wiki) with iteration iteration formula from russian wiki
    This already produces numbers from [0,1), given that initial values belong to that interval and may be good enough for your purposes.

    0 讨论(0)
  • 2021-01-05 21:41

    You are already there.

    The smallest distance between two floats your current generator produces is 1/(2^32).

    So, your generator is efectively producing [0,1-1/(2^32)].

    1/(2^32) is greater than FLT_MIN.

    Thus if you add FLT_MIN to your generator,

    float open_open_flt = FLT_MIN + closed_open_flt;
    

    you'll get [FLT_MIN,1-(1/(2^32))+FLT_MIN], which works as a (0,1) generator.

    0 讨论(0)
  • 2021-01-05 21:49

    Since the probability of actually observing 0 probability is very small, and checking if a number is equal to 0 is least expensive (as compared to addition or multiplication), I would regenerate the random number repeatedly until it is not equal to 0.

    0 讨论(0)
  • 2021-01-05 21:50

    Given a sample x selected randomly from [0, 232), I propose using:

    0x1.fffffep-32 * x + 0x1p-25
    

    Reasoning:

    • These values are such that the highest x produces slightly less than 1-2-25 before rounding, so it is rounded to the largest float less than 1, which is 1-2-24. If we made it any larger, some values would round to 1, which we do not want. If we made it smaller, fewer values would round to 1-2-24, so it would be less represented than we desire (more on this below).
    • The values are such that the lowest x produces 2-25. This produces some symmetry: The distribution is compelled to stop at the high side 1-2-25 before rounding, as explained above, so we make it symmetric on the bottom side, stopping at 0+2-25. To some extent, it is as if we are binning the real number line in bins of width 2-24 and then removing the bins centered on 0 and 1 (which extend 2-25 to either side of those numbers).
    • Each bin that we retain contains about the same number of sample values. However, different float values show up in the bins, because the resolution of float varies. It is finer near 0 and coarser near 1. With this arrangement, each bin is about uniformly represented, but the lower bins will have more samples with lower probability each. The overall distribution remains uniform.
    • We could extend the low end so that it is closer to zero. But then, for most d in (0, ½), there would be more samples in (0, d) than in (1-d, 1), so the distribution would be asymmetric.

    As you can see, the floating-point format forces some irregularities in a distribution from 0 to 1. This issue has been raised in other Stack Overflow questions but never thoroughly discussed, to my knowledge. Whether it suits your purposes to leave these irregularities as described above depends on your application.

    Potential variations:

    • Quantize all the samples so they occur at regularly spaced intervals, 2-24, rather than being finer where the float format is finer.
    • Allow values closer to 1 before rounding but convert them to 1-2-24 after rounding, and lower the bottom endpoint to match. This reduces the excluded segments around 0 and around 1 at the expense of increasing the number of values clumped into 1-2-24 because the resolution is not fine enough for more distinction.
    • Switch to double. Then there is a 1-1 map from original x values to floating-point values, and you can likely get as close to 0 and 1 as desired.

    Also, contrary to ElKamina’s answer, floating-point comparison (even to zero) is not generally faster than addition. Comparison requires branching on the result, which is an issue in many modern CPUs.

    0 讨论(0)
提交回复
热议问题