Random number in the range 1 to sys.maxsize is always 1 mod 2^10

后端 未结 3 1245
被撕碎了的回忆
被撕碎了的回忆 2021-01-01 08:55

I am trying to find the statistical properties of the PRNGs available in Python (2.7.10) by using the frequency test, runs test and the chi squared test.

For carryin

相关标签:
3条回答
  • 2021-01-01 09:11

    That depends on a lot of things, like how exactly the RNG is implemented, how much bits of state it uses, and how exactly the sample function is implemented.

    Here's what the documentation says:

    Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1.

    So if the sample indeed uses random() under the hood, then you should only expect 53 bits of meaningful bits in your result.

    0 讨论(0)
  • 2021-01-01 09:22

    @roeland hinted at the cause: in Python 2, sample() uses int(random.random() * n) repeatedly. Look at the source code (in your Python's Lib/random.py) for full details. In short, random.random() returns no more than 53 significant (non-zero) leading bits; then int() fills the rest of the low-order bits with zeroes (you're obviously on a machine where sys.maxsize == 2**63 - 1); then indexing your base (xrange(1, sys.maxsize)) by an even integer with "a lot" of of low-order 0 bits always returns an odd integer with the same number of low-order 0 bits (except for the last).

    In Python 3 none of that happens - random in Python 3 uses stronger algorithms, and only falls back to random.random() when necessary. For example, here under Python 3.4.3:

    >>> hex(random.randrange(10**70))
    '0x91fc11ed768be3a454bd66f593c218d8bbfa3b99f6285291e1d9f964a9'
    >>> hex(random.randrange(10**70))
    '0x7b07ff02b6676801e33094fca2fcca7f6e235481c479c521643b1acaf4'
    

    EDIT

    Here's a more directly relevant example, under 3.4.3 on a 64-bit box:

    >>> import random, sys
    >>> sys.maxsize == 2**63 - 1
    True
    >>> for i in random.sample(range(1, sys.maxsize), 6):
    ...    print(bin(i))
    0b10001100101001001111110110011111000100110100111001100000010110
    0b100111100110110100111101001100001100110001110010000101101000101
    0b1100000001110000110100111101101010110001100110101111011100111
    0b111110100001111100101001001001101101100100011001001010100001110
    0b1100110100000011100010000011010010100100110111001111100110100
    0b10011010000110101010101110001000101110111100100001111101110111
    

    Python 3 doesn't invoke random.random() at all in this case, but instead iteratively grabs chunks of 32 bits from the underlying Mersenne Twister (32-bit unsigned ints are "the natural" outputs from this implementation of MT) , pasting them together to build a suitable index. So, in Python 3, platform floats have nothing to do with it; in Python 2, quirks of float behavior have everything to do with it.

    0 讨论(0)
  • 2021-01-01 09:30

    It certainly looks like rounding error in random.sample.

    The bottom 4 or so bits are always zero after the multiplication by the spread of the range (maxsize -1) then when the start of the range (1) is added they are always 1

    if the multiplication was working correctly, given that the spread is not a power of two, and given that the random number only has 53 varying bits I'd expect to see varying values in the rightmost bits too.

    0 讨论(0)
提交回复
热议问题