I want to generate N random numbers drawn from a specif distribution (e.g uniform random) between [a,b] which sum to a constant C. I have tried a couple of solutions I could
Let's try to simplify the problem. By substracting the lower bound, we can reduce it to finding N numbers in [0,b-a] such that their sum is C-Na.
Renaming the parameters, we can look for N numbers in [0,m] whose sum is S.
Now the problem is akin to partitioning a segment of length S in N distinct sub-segments of length [0,m].
I think the problem is simply not solvable.
if S=1, N=1000 and m anything above 0, the only possible repartition is one 1 and 999 zeroes, which is nothing like a random spread.
There is a correlation between N, m and S, and even picking random values will not make it disappear.
For the most uniform repartition, the length of the sub-segments will follow a gaussian curve with a mean value of S/N.
If you tweak your random numbers differently, you will end up with whatever bias, but in the end you will never have both a uniform [a,b] repartition and a total length of C, unless the length of your [a,b] interval happens to be 2C/N-a.
In case you want the sample to follow a uniform distribution, the problem reduces to generate N random numbers with sum = 1. This, in turn, is a special case of the Dirichlet distribution but can also be computed more easily using the Exponential distribution. Here is how:
The p1..pN are uniformly distributed (in the simplex of dim N-1) and their sum is 1.
You can now multiply these pi by the constant C you want and translate them by summing some other constant A like this
qi := A + pi*C.
EDIT 3
In order to address some issues raised in the comments, let me add the following:
EDIT 2
One more issue has been raised in the comments:
Why rescaling a uniform sample does not suffice?
In other words, why should I bother to take negative logarithms?
The reason is that if we just rescale then the resulting sample won't distribute uniformly across the segment (0,1) (or [a,b] for the final sample.)
To visualize this let's think 2D, i.e., let's consider the case N=2. A uniform sample (v1,v2) corresponds to a random point in the square with origin (0,0) and corner (1,1). Now, when we normalize such a point dividing it by the sum s=v1+v2 what we are doing is projecting the point onto the diagonal as shown in the picture (keep in mind that the diagonal is the line x + y = 1):
But given that green lines, which are closer to the principal diagonal from (0,0) to (1,1), are longer than orange ones, which are closer to the axes x and y, the projections tend to accumulate more around the center of the projection line (in blue), where the scaled sample lives. This shows that a simple scaling won't produce a uniform sample on the depicted diagonal. On the other hand, it can be proven mathematically that the negative logarithms do produce the desired uniformity. So, instead of copypasting a mathematical proof I would invite everyone to implement both algorithms and check that the resulting plots behave as this answer describes.
(Note: here is a blog post on this interesting subject with an application to the Oil & Gas industry)
Although this was old topic but I think I got a idea. Consider we want N random number which sum is C and each random between a and b. To solve problem, we create N holes and prepare C balls, for each time we ask each hole "Do you want another ball?". If no, we pass to next hole, else, we put a ball into the hole. Each hole has a cap value: b-a. If some hole reach the cap value then always pass to next hole.
Example:
3 random numbers between 0 and 2 which sum is 5.
simulation result:
1st run: -+-
2nd run: ++-
3rd run: ---
4th run: +*+
final:221
-:refuse ball
+:accept ball
*:full pass
well, for n=10000 cant we have a small number in there that is not random?
maybe generating sequence till sum > C-max
reached and then just put one simple number to sum it up.
1 in 10000 is more like a very small noise in the system.
For my answer I'll assume that we have a uniform distribution.
Since we have a uniform distribution, every tuple of C
has the same probability to occur. For example for a = 2, b = 2, C = 12, N = 5
we have 15
possible tuples. From them 10
start with 2
, 4
start with 3
and 1
starts with 4
. This gives the idea of selecting a random number from 1
to 15
in order to choose the first element. From 1
to 10
we select 2
, from 11
to 14
we select 3
and for 15
we select 4
. Then we continue recursively.
#include <time.h>
#include <random>
std::default_random_engine generator(time(0));
int a = 2, b = 4, n = 5, c = 12, numbers[5];
// Calculate how many combinations of n numbers have sum c
int calc_combinations(int n, int c) {
if (n == 1) return (c >= a) && (c <= b);
int sum = 0;
for (int i = a; i <= b; i++) sum += calc_combinations(n - 1, c - i);
return sum;
}
// Chooses a random array of n elements having sum c
void choose(int n, int c, int *numbers) {
if (n == 1) { numbers[0] = c; return; }
int combinations = calc_combinations(n, c);
std::uniform_int_distribution<int> distribution(0, combinations - 1);
int s = distribution(generator);
int sum = 0;
for (int i = a; i <= b; i++) {
if ((sum += calc_combinations(n - 1, c - i)) > s) {
numbers[0] = i;
choose(n - 1, c - i, numbers + 1);
return;
}
}
}
int main() { choose(n, c, numbers); }
Possible outcome:
2
2
3
2
3
This algorithm won't scale well for large N
because of overflows in the calculation of combinations (unless we use a big integer library), the time needed for this calculation and the need for arbitrarily large random numbers.