问题
I am writing a program compatible with both Python 2.7 and 3.5. Some parts of it rely on stochastic process. My unit tests use an arbitrary seed, which leads to the same results across executions and languages... except for the code using random.shuffle
.
Example in Python 2.7:
In[]: import random
random.seed(42)
print(random.random())
l = list(range(20))
random.shuffle(l)
print(l)
Out[]: 0.639426798458
[6, 8, 9, 15, 7, 3, 17, 14, 11, 16, 2, 19, 18, 1, 13, 10, 12, 4, 5, 0]
Same input in Python 3.5:
In []: import random
random.seed(42)
print(random.random())
l = list(range(20))
random.shuffle(l)
print(l)
Out[]: 0.6394267984578837
[3, 5, 2, 15, 9, 12, 16, 19, 6, 13, 18, 14, 10, 1, 11, 4, 17, 7, 8, 0]
Note that the pseudo-random number is the same, but the shuffled lists are different. As expected, reexecuting the cells does not change their respective output.
How could I write the same test code for the two versions of Python?
回答1:
In Python 3.2 the random module was refactored a little to make the output uniform across architectures (given the same seed), see issue #7889. The shuffle()
method was switched to using Random._randbelow()
.
However, the _randbelow()
method was also adjusted, so simply copying the 3.5 version of shuffle()
is not enough to fix this.
That said, if you pass in your own random()
function, the implementation in Python 3.5 is unchanged from the 2.7 version, and thus lets you bypass this limitation:
random.shuffle(l, random.random)
Note however, than now you are subject to the old 32-bit vs 64-bit architecture differences that #7889 tried to solve.
Ignoring several optimisations and special cases, if you include _randbelow()
the 3.5 version can be backported as:
import random
import sys
if sys.version_info >= (3, 2):
newshuffle = random.shuffle
else:
try:
xrange
except NameError:
xrange = range
def newshuffle(x):
def _randbelow(n):
"Return a random int in the range [0,n). Raises ValueError if n==0."
getrandbits = random.getrandbits
k = n.bit_length() # don't use (n-1) here because n can be 1
r = getrandbits(k) # 0 <= r < 2**k
while r >= n:
r = getrandbits(k)
return r
for i in xrange(len(x) - 1, 0, -1):
# pick an element in x[:i+1] with which to exchange x[i]
j = _randbelow(i+1)
x[i], x[j] = x[j], x[i]
which gives you the same output on 2.7 as 3.5:
>>> random.seed(42)
>>> print(random.random())
0.639426798458
>>> l = list(range(20))
>>> newshuffle(l)
>>> print(l)
[3, 5, 2, 15, 9, 12, 16, 19, 6, 13, 18, 14, 10, 1, 11, 4, 17, 7, 8, 0]
回答2:
Elaborating on Martijn Pieters excellent answer and comments, and on this discussion, I finally found a workaround, which arguably doesn't answer my very question, but at the same time doesn't require deep changes. To sum up:
random.seed
actually makes everyrandom
function deterministic, but doesn't necessarily produces the same output across versions;- setting
PYTHONHASHSEED
to 0 disables hash randomization for dictionaries and sets, which by default introduces a factor of non-determinism in Python 3.
So, in the bash script which launches the Python 3 tests, I added:
export PYTHONHASHSEED=0
Then, I temporarily changed my test functions in order to brute-force my way to an integer seed which would reproduces in Python 3 the results expected in Python 2. Lastly, I reverted my changes and replaced the lines:
seed(42)
by something like that:
seed(42 if sys.version_info.major == 2 else 299)
Nothing to brag about, but as the saying goes, sometimes practicality beats purity ;)
This quick workaround may be useful to somebody who wants to test the same stochastic code across different versions of Python!
回答3:
Someone may correct me if I'm wrong but it seems that numpy.random
module do not change between python 2 and 3.
>>> import numpy as np
>>> l = list(range(20))
>>> np.random.RandomState(42).shuffle(l)
>>> l
[0, 17, 15, 1, 8, 5, 11, 3, 18, 16, 13, 2, 9, 19, 4, 12, 7, 10, 14, 6]
I got the same result in both Python 2.7 (with np 1.12.1) and 3.7 (with np 1.14.5).
The doc also states that generated numbers should be the same between versions.
Compatibility Guarantee A fixed seed and a fixed series of calls to ‘RandomState’ methods using the same parameters will always produce the same results up to roundoff error except when the values were incorrect. Incorrect values will be fixed and the NumPy version in which the fix was made will be noted in the relevant docstring. Extension of existing parameter ranges and the addition of new parameters is allowed as long the previous behavior remains unchanged.
来源:https://stackoverflow.com/questions/38943038/difference-between-python-2-and-3-for-shuffle-with-a-given-seed