I am trying to find the statistical properties of the PRNGs available in Python (2.7.10) by using the frequency test, runs test and the chi squared test.
For carryin
That depends on a lot of things, like how exactly the RNG is implemented, how much bits of state it uses, and how exactly the sample
function is implemented.
Here's what the documentation says:
Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1.
So if the sample
indeed uses random()
under the hood, then you should only expect 53 bits of meaningful bits in your result.
@roeland hinted at the cause: in Python 2, sample()
uses int(random.random() * n)
repeatedly. Look at the source code (in your Python's Lib/random.py
) for full details. In short, random.random()
returns no more than 53 significant (non-zero) leading bits; then int()
fills the rest of the low-order bits with zeroes (you're obviously on a machine where sys.maxsize == 2**63 - 1
); then indexing your base (xrange(1, sys.maxsize)
) by an even integer with "a lot" of of low-order 0 bits always returns an odd integer with the same number of low-order 0 bits (except for the last).
In Python 3 none of that happens - random
in Python 3 uses stronger algorithms, and only falls back to random.random()
when necessary. For example, here under Python 3.4.3:
>>> hex(random.randrange(10**70))
'0x91fc11ed768be3a454bd66f593c218d8bbfa3b99f6285291e1d9f964a9'
>>> hex(random.randrange(10**70))
'0x7b07ff02b6676801e33094fca2fcca7f6e235481c479c521643b1acaf4'
Here's a more directly relevant example, under 3.4.3 on a 64-bit box:
>>> import random, sys
>>> sys.maxsize == 2**63 - 1
True
>>> for i in random.sample(range(1, sys.maxsize), 6):
... print(bin(i))
0b10001100101001001111110110011111000100110100111001100000010110
0b100111100110110100111101001100001100110001110010000101101000101
0b1100000001110000110100111101101010110001100110101111011100111
0b111110100001111100101001001001101101100100011001001010100001110
0b1100110100000011100010000011010010100100110111001111100110100
0b10011010000110101010101110001000101110111100100001111101110111
Python 3 doesn't invoke random.random()
at all in this case, but instead iteratively grabs chunks of 32 bits from the underlying Mersenne Twister (32-bit unsigned ints are "the natural" outputs from this implementation of MT) , pasting them together to build a suitable index. So, in Python 3, platform floats have nothing to do with it; in Python 2, quirks of float behavior have everything to do with it.
It certainly looks like rounding error in random.sample.
The bottom 4 or so bits are always zero after the multiplication by the spread of the range (maxsize -1
) then when the start of the range (1
) is added they are always 1
if the multiplication was working correctly, given that the spread is not a power of two, and given that the random number only has 53 varying bits I'd expect to see varying values in the rightmost bits too.