Why are initial random numbers similar when using similar seeds?

后端 未结 4 1332
说谎
说谎 2020-12-03 09:57

I discovered something strange with the generation of random numbers using Java\'s Random class. Basically, if you create multiple Random objects using close seeds (for exam

相关标签:
4条回答
  • 2020-12-03 10:29

    You'd be best served by downloading and reading the Random source, as well as some papers on pseudo-random generators, but here are some of the relevant parts of the source. To begin with, there are three constant parameters that control the algorithm:

    private final static long multiplier = 0x5DEECE66DL;
    private final static long addend = 0xBL;
    private final static long mask = (1L << 48) - 1;
    

    The multiplier works out to approximately 2^34 and change, the mask 2^48 - 1, and the addend is pretty close to 0 for this analysis.

    When you create a Random with a seed, the constructor calls setSeed:

    synchronized public void setSeed(long seed) {
        seed = (seed ^ multiplier) & mask;
        this.seed.set(seed);
        haveNextNextGaussian = false;
    }
    

    You're providing a seed pretty close to zero, so initial seed value that gets set is dominated by multiplier when the two are OR'ed together. In all your test cases with seeds close to zero, the seed that is used internally is roughly 2^34; but it's easy to see that even if you provided very large seed numbers, similar user-provided seeds will yield similar internal seeds.

    The final piece is the next(int) method, which actually generates a random integer of the requested length based on the current seed, and then updates the seed:

    protected int next(int bits) {
        long oldseed, nextseed;
        AtomicLong seed = this.seed;
        do {
        oldseed = seed.get();
        nextseed = (oldseed * multiplier + addend) & mask;
        } while (!seed.compareAndSet(oldseed, nextseed));
        return (int)(nextseed >>> (48 - bits));
    }
    

    This is called a 'linear congruential' pseudo-random generator, meaning that it generates each successive seed by multiplying the current seed by a constant multiplier and then adding a constant addend (and then masking to take the lower 48 bits, in this case). The quality of the generator is determined by the choice of multiplier and addend, but the ouput from all such generators can be easily predicted based on the current input and has a set period before it repeats itself (hence the recommendation not to use them in sensitive applications).

    The reason you're seeing similar initial output from nextDouble given similar seeds is that, because the computation of the next integer only involves a multiplication and addition, the magnitude of the next integer is not much affected by differences in the lower bits. Calculation of the next double involves computing a large integer based on the seed and dividing it by another (constant) large integer, and the magnitude of the result is mostly affected by the magnitude of the integer.

    Repeated calculations of the next seed will magnify the differences in the lower bits of the seed because of the repeated multiplication by the constant multiplier, and because the 48-bit mask throws out the highest bits each time, until eventually you see what looks like an even spread.

    0 讨论(0)
  • 2020-12-03 10:48

    I wouldn't have called this an "issue".

    And also, do you know if this problem only for the first value (or first few values), or if it is more general and using close seeds should be avoided?

    Correlation patterns between successive numbers is a common problem with non-crypto PRNGs, and this is just one manifestation. The correlation (strictly auto-correlation) is inherent in the mathematics underlying the algorithm(s). If you want to understand that, you should probably start by reading the relevant part of Knuth's Art of Computer Programming Chapter 3.

    If you need non-predictability you should use a (true) random seed for Random ... or let the system pick a "pretty random" one for you; e.g. using the no-args constructor. Or better still, use a real random number source or a crypto-quality PRNG instead of Random.


    For the record:

    1. The javadoc (Java 7) does not specify how Random() seeds itself.
    2. The implementation of Random() on Java 7 for Linux, is seeded from the nanosecond clock, XORed with a 'uniquifier' sequence. The 'uniquifier' sequence is LCG which uses different multiplier, and whose state is static. This is intended to avoid auto-correlation of the seeds ...
    0 讨论(0)
  • 2020-12-03 10:49

    This is a fairly typical behaviour for pseudo-random seeds - they aren't required to provide completely different random sequences, they only provide a guarantee that you can get the same sequence again if you use the same seed.

    The behaviour happens because of the mathematical form of the PRNG - the Java one uses a linear congruential generator, so you are just seeing the results running the seed through one round of the linear congruential generator. This isn't enough to completely mix up all the bit patterns, hence you see similar results for similar seeds.

    Your best strategy is probably just to use very different seeds - one option would be to obtain these by hashing the seed values that you are currently using.

    0 讨论(0)
  • 2020-12-03 10:52

    By making random seeds (for instance, using some mathematical functions on System.currentTimeMillis() or System.nanoTime() for seed generation) you can get better random result. Also can look at here for more information

    0 讨论(0)
提交回复
热议问题