Generate N random numbers within a range with a constant sum

前端 未结 5 828
暗喜
暗喜 2020-12-03 12:26

I want to generate N random numbers drawn from a specif distribution (e.g uniform random) between [a,b] which sum to a constant C. I have tried a couple of solutions I could

相关标签:
5条回答
  • 2020-12-03 12:45

    Let's try to simplify the problem. By substracting the lower bound, we can reduce it to finding N numbers in [0,b-a] such that their sum is C-Na.

    Renaming the parameters, we can look for N numbers in [0,m] whose sum is S.

    Now the problem is akin to partitioning a segment of length S in N distinct sub-segments of length [0,m].

    I think the problem is simply not solvable.

    if S=1, N=1000 and m anything above 0, the only possible repartition is one 1 and 999 zeroes, which is nothing like a random spread.

    There is a correlation between N, m and S, and even picking random values will not make it disappear.

    For the most uniform repartition, the length of the sub-segments will follow a gaussian curve with a mean value of S/N.

    If you tweak your random numbers differently, you will end up with whatever bias, but in the end you will never have both a uniform [a,b] repartition and a total length of C, unless the length of your [a,b] interval happens to be 2C/N-a.

    0 讨论(0)
  • 2020-12-03 12:47

    In case you want the sample to follow a uniform distribution, the problem reduces to generate N random numbers with sum = 1. This, in turn, is a special case of the Dirichlet distribution but can also be computed more easily using the Exponential distribution. Here is how:

    1. Take a uniform sample v1 … vN with all vi between 0 and 1.
    2. For all i, 1<=i<=N, define ui := -ln vi (notice that ui > 0).
    3. Normalize the ui as pi := ui/s where s is the sum u1+...+uN.

    The p1..pN are uniformly distributed (in the simplex of dim N-1) and their sum is 1.

    You can now multiply these pi by the constant C you want and translate them by summing some other constant A like this

    qi := A + pi*C.

    EDIT 3

    In order to address some issues raised in the comments, let me add the following:

    • To ensure that the final random sequence falls in the interval [a,b] choose the constants A and C above as A := a and C := b-a, i.e., take qi = a + pi*(b-a). Since pi is in the range (0,1) all qi will be in the range [a,b].
    • One cannot take the (negative) logarithm -ln(vi) if vi happens to be 0 because ln() is not defined at 0. The probability of such an event is extremely low. However, in order to ensure that no error is signaled the generation of v1 ... vN in item 1 above must threat any occurrence of 0 in a special way: consider -ln(0) as +infinity (remember: ln(x) -> -infinity when x->0). Thus the sum s = +infinity, which means that pi = 1 and all other pj = 0. Without this convention the sequence (0...1...0) would never be generated (many thanks to @Severin Pappadeux for this interesting remark.)
    • As explained in the 4th comment attached to the question by @Neil Slater it is logically impossible to fulfill all the requirements of the original framing. Therefore any solution must relax the constraints to a proper subset of the original ones. Other comments by @Behrooz seem to confirm that this would suffice in this case.

    EDIT 2

    One more issue has been raised in the comments:

    Why rescaling a uniform sample does not suffice?

    In other words, why should I bother to take negative logarithms?

    The reason is that if we just rescale then the resulting sample won't distribute uniformly across the segment (0,1) (or [a,b] for the final sample.)

    To visualize this let's think 2D, i.e., let's consider the case N=2. A uniform sample (v1,v2) corresponds to a random point in the square with origin (0,0) and corner (1,1). Now, when we normalize such a point dividing it by the sum s=v1+v2 what we are doing is projecting the point onto the diagonal as shown in the picture (keep in mind that the diagonal is the line x + y = 1):

    enter image description here

    But given that green lines, which are closer to the principal diagonal from (0,0) to (1,1), are longer than orange ones, which are closer to the axes x and y, the projections tend to accumulate more around the center of the projection line (in blue), where the scaled sample lives. This shows that a simple scaling won't produce a uniform sample on the depicted diagonal. On the other hand, it can be proven mathematically that the negative logarithms do produce the desired uniformity. So, instead of copypasting a mathematical proof I would invite everyone to implement both algorithms and check that the resulting plots behave as this answer describes.

    (Note: here is a blog post on this interesting subject with an application to the Oil & Gas industry)

    0 讨论(0)
  • 2020-12-03 12:51

    Although this was old topic but I think I got a idea. Consider we want N random number which sum is C and each random between a and b. To solve problem, we create N holes and prepare C balls, for each time we ask each hole "Do you want another ball?". If no, we pass to next hole, else, we put a ball into the hole. Each hole has a cap value: b-a. If some hole reach the cap value then always pass to next hole.

    Example:
    3 random numbers between 0 and 2 which sum is 5.

    simulation result:
    1st run: -+-
    2nd run: ++-
    3rd run: ---
    4th run: +*+
    final:221

    -:refuse ball
    +:accept ball
    *:full pass

    0 讨论(0)
  • 2020-12-03 12:55

    well, for n=10000 cant we have a small number in there that is not random?

    maybe generating sequence till sum > C-max reached and then just put one simple number to sum it up.

    1 in 10000 is more like a very small noise in the system.

    0 讨论(0)
  • 2020-12-03 12:57

    For my answer I'll assume that we have a uniform distribution.

    Since we have a uniform distribution, every tuple of C has the same probability to occur. For example for a = 2, b = 2, C = 12, N = 5 we have 15 possible tuples. From them 10 start with 2, 4 start with 3 and 1 starts with 4. This gives the idea of selecting a random number from 1 to 15 in order to choose the first element. From 1 to 10 we select 2, from 11 to 14 we select 3 and for 15 we select 4. Then we continue recursively.

    #include <time.h>
    #include <random>
    
    std::default_random_engine generator(time(0));
    int a = 2, b = 4, n = 5, c = 12, numbers[5];
    
    // Calculate how many combinations of n numbers have sum c
    int calc_combinations(int n, int c) {
        if (n == 1) return (c >= a) && (c <= b);
        int sum = 0;
        for (int i = a; i <= b; i++) sum += calc_combinations(n - 1, c - i);
        return sum;
    }
    
    // Chooses a random array of n elements having sum c
    void choose(int n, int c, int *numbers) {
        if (n == 1) { numbers[0] = c; return; }
    
        int combinations = calc_combinations(n, c);
        std::uniform_int_distribution<int> distribution(0, combinations - 1);
        int s = distribution(generator);
        int sum = 0;
        for (int i = a; i <= b; i++) {
            if ((sum += calc_combinations(n - 1, c - i)) > s) {
                numbers[0] = i;
                choose(n - 1, c - i, numbers + 1);
                return;
            }
        }
    }
    
    int main() { choose(n, c, numbers); }
    

    Possible outcome:

    2
    2
    3
    2
    3
    

    This algorithm won't scale well for large N because of overflows in the calculation of combinations (unless we use a big integer library), the time needed for this calculation and the need for arbitrarily large random numbers.

    0 讨论(0)
提交回复
热议问题