python seed() not keeping same sequence

微笑、不失礼 提交于 2020-05-27 03:07:49

问题


I'm using a random.seed() to try and keep the random.sample() the same as I sample more values from a list and at some point the numbers change.....where I thought the one purpose of the seed() function was to keep the numbers the same.

Heres a test I did to prove it doesn't keep the same numbers.

import random

a=range(0,100)
random.seed(1)
a = random.sample(a,10)
print a

then change the sample much higher and the sequence will change(at least for me they always do):

a = random.sample(a,40)
print a

I'm sort of a newb so maybe this is an easy fix but I would appreciate any help on this. Thanks!


回答1:


If you were to draw independent samples from the generator, what would happen would be exactly what you're expecting:

In [1]: import random

In [2]: random.seed(1)

In [3]: [random.randint(0, 99) for _ in range(10)]
Out[3]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2]

In [4]: random.seed(1)

In [5]: [random.randint(0, 99) for _ in range(40)]
Out[5]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2, 83, 43 ...]

As you can see, the first ten numbers are indeed the same.

It is the fact that random.sample() is drawing samples without replacement that's getting in the way. To understand how these algorithms work, see Reservoir Sampling. In essence what happens is that later samples can push earlier samples out of the result set.

One alternative might be to shuffle a list of indices and then take either 10 or 40 first elements:

In [1]: import random

In [2]: a = range(0,100)

In [3]: random.shuffle(a)

In [4]: a[:10]
Out[4]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80]

In [5]: a[:40]
Out[5]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80, ...]



回答2:


It seems that random.sample is deterministic only if the seed and sample size are kept constant. In other words, even if you reset the seed, generating a sample with a different length is not "the same" random operation, and may give a different initial subsequence than generating a smaller sample with the same seed. In other words, the same random numbers are being generated internally, but the way sample uses them to derive the random sequence is different depending on how large a sample you ask for.




回答3:


You are assuming an implementation of random.sample something like this:

def samples(lst, k):
    n = len(lst)
    indices = []
    while len(indices) < k:
        index = random.randrange(n)
        if index not in indices:
            indices.append(index)
    return [lst[i] for i in indices]

Which gives:

>>> random.seed(1)
>>> samples(list(range(20)), 5)
[4, 18, 2, 8, 3]
>>> random.seed(1)
>>> samples(list(range(20)), 10)
[4, 18, 2, 8, 3, 15, 14, 12, 6, 0]

However, that isn't how random.sample is actually implemented; seed does work how you think, it's sample that doesn't!



来源:https://stackoverflow.com/questions/23066235/python-seed-not-keeping-same-sequence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!