Distribution of Random Numbers

。_饼干妹妹 提交于 2019-12-03 00:27:37

My real question is whether option 1 is mathematically valid.

Lets start with option 2. The random number generator used by java.util.Random is specified in the javadoc as follows:

The class uses a 48-bit seed, which is modified using a linear congruential formula. (See Donald Knuth, The Art of Computer Programming, Volume 2, Section 3.2.1.)

and there is more specific detail in the various methods' javadocs.

But the point is that we are using a sequence generated by a linear congruential formula, and such formulae have a significant degree of auto-correlation ... which could be problematic.

Now with option 1, you are using a different Random instance with a new seed each time, and applying one round of the LC formula. Thus you are getting a sequence of numbers that are likely to be autocorrelated with the seeds. However, the seeds are generated in different ways, depending on the Java version.

Java 6 does this:

 public Random() { this(++seedUniquifier + System.nanoTime()); }
 private static volatile long seedUniquifier = 8682522807148012L;

... which is not very random at all. If you created Random instances at a constant interval, the seeds are likely to be closely spaced, and therefore the sequence of random numbers produced by your option #1 are liable to be auto-correlated.

By contrast, Java 7 and 8 do this:

 public Random() {
     this(seedUniquifier() ^ System.nanoTime());
 }

 private static long seedUniquifier() {
     // L'Ecuyer, "Tables of Linear Congruential Generators of
     // Different Sizes and Good Lattice Structure", 1999
     for (;;) {
         long current = seedUniquifier.get();
         long next = current * 181783497276652981L;
         if (seedUniquifier.compareAndSet(current, next))
             return next;
     }
 }

 private static final AtomicLong seedUniquifier
     = new AtomicLong(8682522807148012L);

The sequence of seeds produced by the above are likely be a much better approximation to (true) randomness. That probably makes your option #1 superior to option #2.

The downside of your option #1 in Java 6 through 8 is that the System.nanoTime() probably call involves a system call. That is relatively expensive.


So the short answer is that it is Java version specific which of option #1 and option #2 produces better quality "random" numbers ... from a mathematical perspective.

In both cases, the distribution of numbers will be uniform over a large enough sample size, though I'm not sure it is meaningful to talk about probability distributions when the process is deterministic.

However, neither approach would be suitable as a "crypto strength" random number generator.

Sheepy

Quick Code:

// For occasional tasks that just need an average quality random number
ExecutorService threadPool = Executors.newCachedThreadPool();
threadPool.execute( () -> {
  ThreadLocalRandom.current().nextInt(); // Fast and unique!
} );


// For SecureRandom, high quality random number
final Random r = new SecureRandom();
ExecutorService threadPool = Executors.newCachedThreadPool();
threadPool.execute( () -> {
  r.nextInt(); // sun.security.provider.NativePRNG uses singleton.  Can't dodge contention.
} );


// Apache Common Math - Mersenne Twister - decent and non-singleton
int cpu = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool( cpu );
Map<Thread, RandomGenerator> random = new WeakHashMap<>( cpu, 1.0f );

executor.execute( ()-> {
   RandomGenerator r;
   synchronized ( random ) { // Get or create generator.
      r = random.get( Thread.currentThread() );
      if ( r == null ) random.put( Thread.currentThread(), r = new MersenneTwister() );
   }
   r.nextInt( 1000 );
} );

Explanation:

  1. Two Random of same seed will yield same numbers.
    1. Thus we will focus on whether we can guarantee different seeds.
  2. In theory, new Random() in each thread does not guarantee different seed.

    1. new Random is seeded by nanoTime and a "unique" number.
    2. The number is not guaranteed unique because its calculation is not synchronized.
    3. As for nanoTime, it guarantees to be "at least at good as currentTimeMillis"
    4. currentTimeMillis does not guarantee anything and can be pretty coarse.
    5. In real life, the two times are the same only on old linux systems and Win 98.
  3. In practice, new Random() in each thread basically always get different seeds.

    1. Creating thread is expensive. Mine creates 1 per 50,000ns. And that's not slow.
    2. 50µs is way above nanoTime's common granularities of up to a few ten ns.
    3. The unique number calculation (1.2) is also fast, so getting same number is very rare.
    4. Use Executors to create a thread pool to avoid the heavy new thread overhead.
  4. zapl suggested ThreadLocalRandom.current().nextInt(). Great idea.

    1. It does not create new Random, but it is also a linear congruential generator.
    2. It generate a new random for each call thread as that thread's seed.
    3. It is built to be very fast in multi-thread. (See notes below.)
    4. It is statically seeded by SecureRandom, which produce better quality random numbers.
  5. "uniformally distributed" is just one small part of randomness tests.

    1. Random is somewhat uniform, and its result can be predicted given just two values.
    2. SecureRandom guarantees this won't happens. (i.e. cryptographically strong)
    3. There is no risk of seed collision if you create a new SecureRandom in each thread.
    4. But currently its source is single thread anyway, no parallel generation.
    5. For a good RNG that supports multi-thread, find external help like Apache Common's MT.

Note: Implementation details deduced from Java 8 source code. Future Java version may change; for example, ThreadLocalRandom is using sun.misc.Unsafe to store the seeds, which may be removed in Java 9 forcing ThreadLocalRandom to find a new way to work without contention.

No.

There are no guarantees on the properties of the distribution of numbers that will be produced by Option 1. As has been made clear in other answers, the implementation of the constructor for java.util.Random depends on the system time. Therefore, in order to make a guarantee on the properties of the distribution of numbers you get with Option 1, you would need to be able to make guarantees about the distribution of numbers produced by the calls your program makes to get the system time on any platform where the program will be running.

With option 2, however, there are mathematical guarantees that can be made about the distribution of numbers that will be produced during one execution of the program. With a linear congruential generator (the pseudorandom number generation algorithm used by java.util.Random) some of the properties of randomness are not quite as good as with other algorithms, but the distribution is guaranteed to be relatively uniform.

This does not necessarily mean that Option 1 cannot serve your purposes. It depends on what you are doing.

majk

Java initializes the random seed with System.nanoTime() and a sequential counter. This gives somewhat a guarantee that the seed will be different for each invocation, although I would refrain from calling it cryptographically secure.

From the performance point of view - do you really expect locking on internal state of Random in option 1 to have a larger performance hit then all of the following:

  • accessing and incrementing volatile long
  • getting the current system time (which is quite expensive)
  • dynamic allocation
  • another object for garbage collection

My suggestion will be to do benchmarks of your real application to find out, but I expect option 1 to be the slowest of all three.

In my experience, the best balance between good distribution and performance is given by using something like the "Messerne Twister" generator (see in Apache Commons) . For an even fancier solution, see this.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!