I want to collect the \"best\" way to generate random numbers on all four types of intervals in one place. I\'m sick of Googling this. Search results turn up a lot of crap.
Off the top of my head, I'd just provide all the variants for the different floating point and integer types (bonus points for a templated C++ implementation) and I'd replace rand()
with something better (drand48()
comes to mind)
First, generate random numbers on [a,b]. To generate random numbers on [a,b), just generate a random number on [a,b], check if it equals b, and if so try again. Similarly for all the other open interval variants.
This question is not ready for answering because the problem has been incompletely specified. In particular, no specification has been stated for how finely the set of values that can be generated should be distributed. For illustration, consider generating values for [0, 1], and consider a floating-point format with representable values:
0, 1/16, 2/16, 3/16, 4/16, 6/16, 8/16, 12/16, 1.
Several distributions over these values might be considered “uniform”:
I doubt the first of these was intended, and I will dismiss it. The second is similar to a suggestion by Steve Jessop, but it is still incompletely specified. Should 0 be selected with a probability proportional to the interval from it to the midpoint to the next point? (This would give a probability of 1/32.) Or should it be associated with an interval centered on it, from -1/32 to 1/32? (This would give it a probability of 1/17, presuming 1 were also allocated an interval extended 1/32 beyond itself.)
You might reason that this is a closed interval, so it should stop at 0 and at 1. But suppose we had, for some application, chopped a distribution over [0, 2] into the intervals [0, 1] and (1, 2]. We would want the union of distributions over the latter two intervals to equal the distribution over the former interval. So our distributions ought to mesh nicely.
The third case has similar issues. Perhaps, if we wish to preserve granularity like this, 0 should be selected with probability 1/8, the three points 1/4, 1/2, and 3/4 with probability 1/4 each, and 1 with probability 1/8.
Aside from these issues of specifying the desired properties of the generators, the code proposed by the questioner has some issues:
Presuming that RAND_MAX+1 is a power of two (and thus dividing by it is “nice” in binary floating-point arithmetic), dividing by RAND_MAX or RAND_MAX+2 may cause some irregularities in the generated values. There may be odd quantizations in them.
When 1/(RAND_MAX+1) ≤ 1/4 ULP(1), RAND_MAX/(RAND_MAX+1) will round up and return 1 when it should not because the interval is [0, 1). (“ULP(1)” means the unit of least precision for the value 1 in the float-point format being used.) (This will not have been observed in tests with long double where RAND_MAX fits within the bits of the significand, but it will occur, for example, where RAND_MAX is 2147483647 and the floating-point type is float, with its 24-bit significand.)
Multiplying by (b-a)
and adding a
introduces rounding errors, the consequences of which must be evaluated. There are a number of cases, such as when b-a
is small and a
is large, when a
and b
straddle zero (thus causing loss of granularity near b even though finer results are representable), and so on.
The lower bound of the results for (0, 1) is the floating-point value nearest 1/(RAND_MAX+2). This bound has no relationship to the fineness of the floating-point values or the desired distribution; it is simply an artifact of the implementation of rand. Values in (0, 1/(RAND_MAX+2)) are omitted without any cause stemming from the problem specification. A similar artifact may exist on the upper end (depending on the particular floating-point format, the rand implementation, and the interval endpoint, b).
I submit the reason the questioner encountered unsatisfying answers for this “simple” problem is that it is not a simple problem.
The following is the (very crude) test I am using to find basic errors in the numbers being generated. It is not intended to show the generated numbers are good but that they are not bad.
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(int argc, char *argv[]) {
long double x1,x2,x3,x4;
if ( argc!=2 ) {
printf("USAGE: %s [1,2,3,4]\n",argv[0]);
exit(EXIT_SUCCESS);
}
srand((unsigned int)time(NULL));
printf("This program simply generates random numbers in the chosen interval\n"
"and looks for values on the boundary or outside it. When an\n"
"allowable boundary is found, it reports it. Unexpected \"impossible\"\n"
"values will be reported and the program will terminte. Under\n"
"normal circumstances, the program should not terminate. Use ctrl-c.\n\n");
switch ( atoi(argv[1]) ) {
case 1:
/* x1 will be an element of [0,1] */
printf("NOTE: Testing [0,1].\n");
while ( 1 ) {
x1=((long double)rand()/RAND_MAX);
if ( x1==0 ) {
printf("x1=0 ENCOUNTERED.\n");
} else if ( x1==1 ) {
printf("x1=1 ENCOUNTERED.\n");
} else if ( x1 < 0 ) {
printf("x1<0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
} else if ( x1 > 1 ) {
printf("x1>0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
}
}
break;
case 2:
/* x2 will be an element of [0,1) */
printf("NOTE: Testing [0,1).\n");
while ( 1 ) {
x2=((long double)rand()/((long double)RAND_MAX+1));
if ( x2==0 ) {
printf("x2=0 ENCOUNTERED.\n");
} else if ( x2==1 ) {
printf("x2=1 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
} else if ( x2 < 0 ) {
printf("x2<0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
} else if ( x2 > 1 ) {
printf("x2>0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
}
}
break;
case 3:
/* x3 will be an element of (0,1] */
printf("NOTE: Testing (0,1].\n");
while ( 1 ) {
x3=(((long double)rand()+1)/((long double)RAND_MAX+1));
if ( x3==1 ) {
printf("x3=1 ENCOUNTERED.\n");
} else if ( x3==0 ) {
printf("x3=0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
} else if ( x3 < 0 ) {
printf("x3<0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
} else if ( x3 > 1 ) {
printf("x3>0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
}
}
break;
case 4:
/* x4 will be an element of (0,1) */
printf("NOTE: Testing (0,1).\n");
while ( 1 ) {
x4=(((long double)rand()+1)/((long double)RAND_MAX+2));
if ( x4==0 ) {
printf("x4=0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
} else if ( x4==1 ) {
printf("x4=1 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
} else if ( x4 < 0 ) {
printf("x4<0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
} else if ( x4 > 1 ) {
printf("x4>0 ENCOUNTERED. Abnormal termination.\n");
exit(EXIT_FAILURE);
}
}
break;
default:
printf("ERROR: invalid argument. Enter 1, 2, 3, or 4 for [0,1], [0,1), (0,1], and (0,1), respectively.\n");
exit(EXIT_FAILURE);
}
exit(EXIT_SUCCESS);
}
If you want every double in the range to be possible, with probability proportional to the difference between it and its adjacent double values, then it's actually really hard.
Consider the range [0, 1000]
. There are an absolute bucketload of values in the very tiny first part of the range: a million of them between 0
and 1000000*DBL_MIN
, and DBL_MIN
is about 2 * 10-308. There are more than 2^32
values in the range altogether, so clearly one call to rand()
isn't enough to generate them all. What you'd need to do is generate the mantissa of your double uniformly, and select an exponent with an exponential distribution, and then fudge things a bit to ensure the result is in range.
If you don't require every double in the range to be possible, then the difference between open and closed ranges is fairly irrelevant, because in a "true" continuous uniform random distribution, the probability of any exact value occurring is 0 anyway. So you might as well just generate a number in the open range.
All that said: yes, your proposed implementations generate values that are in the ranges you say, and for the closed and half-closed ranges they generate the end-points with probability 1/(RAND_MAX+1)
or so. That's good enough for many or most practical purposes.
Your fiddling around with +1 and +2 works provided that RAND_MAX+2
is within the range that double
can represent exactly. This is true for IEEE double precision and 32 bit int
, but it's not actually guaranteed by the C standard.
(I'm ignoring your use of long double
because it confuses things a bit. It's guaranteed to be at least as big as double
, but there are common implementations in which it's exactly the same as double
, so the long
doesn't add anything except uncertainty).