Generate N random numbers within a range with a constant sum

前端未结

关注

 5  828

I want to generate N random numbers drawn from a specif distribution (e.g uniform random) between [a,b] which sum to a constant C. I have tried a couple of solutions I could

相关标签:

5条回答

有刺的猬

2020-12-03 12:45

Let's try to simplify the problem. By substracting the lower bound, we can reduce it to finding N numbers in [0,b-a] such that their sum is C-Na.

Renaming the parameters, we can look for N numbers in [0,m] whose sum is S.

Now the problem is akin to partitioning a segment of length S in N distinct sub-segments of length [0,m].

I think the problem is simply not solvable.

if S=1, N=1000 and m anything above 0, the only possible repartition is one 1 and 999 zeroes, which is nothing like a random spread.

There is a correlation between N, m and S, and even picking random values will not make it disappear.

For the most uniform repartition, the length of the sub-segments will follow a gaussian curve with a mean value of S/N.

If you tweak your random numbers differently, you will end up with whatever bias, but in the end you will never have both a uniform [a,b] repartition and a total length of C, unless the length of your [a,b] interval happens to be 2C/N-a.

0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-12-03 12:47
In case you want the sample to follow a uniform distribution, the problem reduces to generate N random numbers with sum = 1. This, in turn, is a special case of the Dirichlet distribution but can also be computed more easily using the Exponential distribution. Here is how:
1. Take a uniform sample v₁ … v_N with all v_i between 0 and 1.
2. For all i, 1<=i<=N, define u_i := -ln v_i (notice that u_i > 0).
3. Normalize the u_i as p_i := u_i/s where s is the sum u₁+...+u_N.
The p₁..p_N are uniformly distributed (in the simplex of dim N-1) and their sum is 1.

You can now multiply these p_i by the constant C you want and translate them by summing some other constant A like this

q_i := A + p_i*C.

EDIT 3

In order to address some issues raised in the comments, let me add the following:
- To ensure that the final random sequence falls in the interval [a,b] choose the constants A and C above as A := a and C := b-a, i.e., take q_i = a + p_i*(b-a). Since p_i is in the range (0,1) all q_i will be in the range [a,b].
- One cannot take the (negative) logarithm -ln(v_i) if v_i happens to be 0 because ln() is not defined at 0. The probability of such an event is extremely low. However, in order to ensure that no error is signaled the generation of v₁ ... v_N in item 1 above must threat any occurrence of 0 in a special way: consider -ln(0) as +infinity (remember: ln(x) -> -infinity when x->0). Thus the sum s = +infinity, which means that p_i = 1 and all other p_j = 0. Without this convention the sequence (0...1...0) would never be generated (many thanks to @Severin Pappadeux for this interesting remark.)
- As explained in the 4th comment attached to the question by @Neil Slater it is logically impossible to fulfill all the requirements of the original framing. Therefore any solution must relax the constraints to a proper subset of the original ones. Other comments by @Behrooz seem to confirm that this would suffice in this case.
EDIT 2

One more issue has been raised in the comments:

Why rescaling a uniform sample does not suffice?

In other words, why should I bother to take negative logarithms?

The reason is that if we just rescale then the resulting sample won't distribute uniformly across the segment (0,1) (or [a,b] for the final sample.)

To visualize this let's think 2D, i.e., let's consider the case N=2. A uniform sample (v₁,v₂) corresponds to a random point in the square with origin (0,0) and corner (1,1). Now, when we normalize such a point dividing it by the sum s=v₁+v₂ what we are doing is projecting the point onto the diagonal as shown in the picture (keep in mind that the diagonal is the line x + y = 1):

But given that green lines, which are closer to the principal diagonal from (0,0) to (1,1), are longer than orange ones, which are closer to the axes x and y, the projections tend to accumulate more around the center of the projection line (in blue), where the scaled sample lives. This shows that a simple scaling won't produce a uniform sample on the depicted diagonal. On the other hand, it can be proven mathematically that the negative logarithms do produce the desired uniformity. So, instead of copypasting a mathematical proof I would invite everyone to implement both algorithms and check that the resulting plots behave as this answer describes.

(Note: here is a blog post on this interesting subject with an application to the Oil & Gas industry)
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-03 12:51

Although this was old topic but I think I got a idea. Consider we want N random number which sum is C and each random between a and b. To solve problem, we create N holes and prepare C balls, for each time we ask each hole "Do you want another ball?". If no, we pass to next hole, else, we put a ball into the hole. Each hole has a cap value: b-a. If some hole reach the cap value then always pass to next hole.

Example:
3 random numbers between 0 and 2 which sum is 5.

simulation result:
1st run: -+-
2nd run: ++-
3rd run: ---
4th run: +*+
final:221

-:refuse ball
+:accept ball
*:full pass

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-03 12:55

well, for n=10000 cant we have a small number in there that is not random?

maybe generating sequence till sum > C-max reached and then just put one simple number to sum it up.

1 in 10000 is more like a very small noise in the system.

0 讨论(0)
发布评论:

提交评论
- 加载中...

野趣味

2020-12-03 12:57

For my answer I'll assume that we have a uniform distribution.

Since we have a uniform distribution, every tuple of C has the same probability to occur. For example for a = 2, b = 2, C = 12, N = 5 we have 15 possible tuples. From them 10 start with 2, 4 start with 3 and 1 starts with 4. This gives the idea of selecting a random number from 1 to 15 in order to choose the first element. From 1 to 10 we select 2, from 11 to 14 we select 3 and for 15 we select 4. Then we continue recursively.

#include <time.h>
#include <random>

std::default_random_engine generator(time(0));
int a = 2, b = 4, n = 5, c = 12, numbers[5];

// Calculate how many combinations of n numbers have sum c
int calc_combinations(int n, int c) {
    if (n == 1) return (c >= a) && (c <= b);
    int sum = 0;
    for (int i = a; i <= b; i++) sum += calc_combinations(n - 1, c - i);
    return sum;
}

// Chooses a random array of n elements having sum c
void choose(int n, int c, int *numbers) {
    if (n == 1) { numbers[0] = c; return; }

    int combinations = calc_combinations(n, c);
    std::uniform_int_distribution<int> distribution(0, combinations - 1);
    int s = distribution(generator);
    int sum = 0;
    for (int i = a; i <= b; i++) {
        if ((sum += calc_combinations(n - 1, c - i)) > s) {
            numbers[0] = i;
            choose(n - 1, c - i, numbers + 1);
            return;
        }
    }
}

int main() { choose(n, c, numbers); }

Possible outcome:

This algorithm won't scale well for large N because of overflows in the calculation of combinations (unless we use a big integer library), the time needed for this calculation and the need for arbitrarily large random numbers.

0 讨论(0)