What's with 181783497276652981 and 8682522807148012 in Random (Java 7)?

后端 未结 3 1179
天涯浪人
天涯浪人 2021-01-29 21:25

Why were 181783497276652981 and 8682522807148012 chosen in Random.java?

Here\'s the relevant source code from Java SE JDK 1.7:

相关标签:
3条回答
  • 2021-01-29 21:50

    As per the link you provided, they have chosen (after adding the missing 1 :) ) the best yield from 2^64 because long can't have have a number from 2^128

    0 讨论(0)
    1. Was this number simply mis-copied into Java?

      Yes, seems to be a typo.

    2. Does 181783497276652981 have an acceptable merit?

      This could be determined using the evaluation algorithm presented in the paper. But the merit of the "original" number is probably higher.

    3. And why was 8682522807148012 chosen?

      Seems to be random. It could be the result of System.nanoTime() when the code was written.

    4. Could other numbers have been chosen that would have worked as well as these two numbers?

      Not every number would be equally "good". So, no.

    Seeding Strategies

    There are differences in the default-seeding schema between different versions and implementation of the JRE.

    public Random() { this(System.currentTimeMillis()); }
    
    public Random() { this(++seedUniquifier + System.nanoTime()); }
    
    public Random() { this(seedUniquifier() ^ System.nanoTime()); }
    

    The first one is not acceptable if you create multiple RNGs in a row. If their creation times fall in the same millisecond range, they will give completely identical sequences. (same seed => same sequence)

    The second one is not thread safe. Multiple threads can get identical RNGs when initializing at the same time. Additionally, seeds of subsequent initializations tend to be correlated. Depending on the actual timer resolution of the system, the seed sequence could be linearly increasing (n, n+1, n+2, ...). As stated in How different do random seeds need to be? and the referenced paper Common defects in initialization of pseudorandom number generators, correlated seeds can generate correlation among the actual sequences of multiple RNGs.

    The third approach creates randomly distributed and thus uncorrelated seeds, even across threads and subsequent initializations. So the current java docs:

    This constructor sets the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor.

    could be extended by "across threads" and "uncorrelated"

    Seed Sequence Quality

    But the randomness of the seeding sequence is only as good as the underlying RNG. The RNG used for the seed sequence in this java implementation uses a multiplicative linear congruential generator (MLCG) with c=0 and m=2^64. (The modulus 2^64 is implicitly given by the overflow of 64bit long integers) Because of the zero c and the power-of-2-modulus, the "quality" (cycle length, bit-correlation, ...) is limited. As the paper says, besides the overall cycle length, every single bit has an own cycle length, which decreases exponentially for less significant bits. Thus, lower bits have a smaller repetition pattern. (The result of seedUniquifier() should be bit-reversed, before it is truncated to 48-bits in the actual RNG)

    But it is fast! And to avoid unnecessary compare-and-set-loops, the loop body should be fast. This probably explains the usage of this specific MLCG, without addition, without xoring, just one multiplication.

    And the mentioned paper presents a list of good "multipliers" for c=0 and m=2^64, as 1181783497276652981.

    All in all: A for effort @ JRE-developers ;) But there is a typo. (But who knows, unless someone evaluates it, there is the possibility that the missing leading 1 actually improves the seeding RNG.)

    But some multipliers are definitely worse: "1" leads to a constant sequence. "2" leads to a single-bit-moving sequence (somehow correlated) ...

    The inter-sequence-correlation for RNGs is actually relevant for (Monte Carlo) Simulations, where multiple random sequences are instantiated and even parallelized. Thus a good seeding strategy is necessary to get "independent" simulation runs. Therefore the C++11 standard introduces the concept of a Seed Sequence for generating uncorrelated seeds.

    0 讨论(0)
  • 2021-01-29 21:53

    If you consider that the equation used for the random number generator is:

    LCGEquation

    Where X(n+1) is the next number, a is the multipler, X(n) is the current number, c is the increment and m is the modulus.

    If you look further into Random, a, c and m are defined in the header of the class

    private static final long multiplier = 0x5DEECE66DL;   //= 25214903917 -- 'a'
    private static final long addend = 0xBL;               //= 11          -- 'c'
    private static final long mask = (1L << 48) - 1;       //= 2 ^ 48 - 1  -- 'm'
    

    and looking at the method protected int next(int bits) this is were the equation is implemented

    nextseed = (oldseed * multiplier + addend) & mask;
    //X(n+1) =  (X(n)   *      a     +    c  ) mod m
    

    This implies that the method seedUniquifier() is actually getting X(n) or in the first case at initialisation X(0) which is actually 8682522807148012 * 181783497276652981, this value is then modified further by the value of System.nanoTime(). This algorithm is consistent with the equation above but with the following X(0) = 8682522807148012, a = 181783497276652981, m = 2 ^ 64 and c = 0. But as the mod m of is preformed by the long overflow the above equation just becomes

    eq2

    Looking at the paper, the value of a = 1181783497276652981 is for m = 2 ^ 64, c = 0. So it appears to just be a typo and the value 8682522807148012 for X(0) which appears to be a seeming randomly chosen number from legacy code for Random. As seen here. But the merit of these chosen numbers could still be valid but as mentioned by Thomas B. probably not as "good" as the one in the paper.

    EDIT - Below original thoughts have since been clarified so can be disregarded but leaving it for reference

    This leads me the conclusions:

    1. The reference to the paper is not for the value itself but for the methods used to obtain the values due to the different values of a, c and m

    2. It is mere coincidence that the value is otherwise the same other than the leading 1 and the comment is misplaced (still struggling to believe this though)

    OR

    There has been a serious misunderstanding of the tables in the paper and the developers have just chosen a value at random as by the time it is multiplied out what was the point in using the table value in the first place especially as you can just provide your own seed value any way in which case these values are not even taken into account

    So to answer your question

    Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?

    Yes, any number could have been used, in fact if you specify a seed value when you Instantiate Random you are using any other value. This value does not have any effect on the performance of the generator, this is determined by the values of a,c and m which are hard coded within the class.

    0 讨论(0)
提交回复
热议问题