问题
In order to make random simulations we run reproducible later, my colleagues and I often explicitly seed the random
or numpy.random
modules' random number generators using the random.seed
and np.random.seed
methods. Seeding with an arbitrary constant like 42 is fine if we're just using one of those modules in a program, but sometimes, we use both random
and np.random
in the same program. I'm unsure whether there are any best practices I should be following about how to seed the two RNGs together.
In particular, I'm worried that there's some sort of trap we could fall into where the two RNGs together behave in a "non-random" way, such as both generating the exact same sequence of random numbers, or one sequence trailing the other by a few values (e.g. the kth number from random
is always the k+20th number from np.random
), or the two sequences being related to each other in some other mathematical way. (I realise that pseudo-random number generators are all imperfect simulations of true randomness, but I want to avoid exacerbating this with poor seed choices.)
With this objective in mind, are there any particular ways we should or shouldn't seed the two RNGs? I've used, or seen colleagues use, a few different tactics, like:
Using the same arbitrary seed:
random.seed(42) np.random.seed(42)
Using two different arbitrary seeds:
random.seed(271828) np.random.seed(314159)
Using a random number from one RNG to seed the other:
random.seed(42) np.random.seed(random.randint(0, 2**32))
... and I've never noticed any strange outcomes from any of these approaches... but maybe I've just missed them. Are there any officially blessed approaches to this? And are there any possible traps that I can spot and raise the alarm about in code review?
回答1:
I will discuss some guidelines on how multiple pseudorandom number generators (PRNGs) should be seeded. I assume you're not using random numbers for information security purposes (if you are, only a cryptographic RNG is appropriate and this advice doesn't apply).
- In general, if you have to seed multiple PRNGs at once, you should give each one a seed that is unrelated to the other seeds.
- If your application creates multiple processes that need reproducible random numbers for the same purpose, you should assign each process a unique number, give each one the same seed, and generate a new seed in each process by hashing the previous seed and the unique number. See "Seed Generation for Noncryptographic PRNGs" for details. If you can use NumPy 1.17, see also "Parallel Random Number Generation".
- You should avoid seeding PRNGs (especially several at once) with linearly related numbers, sequential counters, or timestamps.
- You mentioned this question in a comment, when I started writing this answer. The advice there is not to seed multiple instances of the same kind of PRNG. This advice, however, doesn't apply as much if the seeds are chosen to be unrelated to each other, or if a PRNG with a very big state (such as Mersenne Twister) is used and allows its whole state to be seeded. The accepted answer there (at the time of this writing) demonstrates what happens when multiple instances of .NET's
System.Random
, with sequential seeds, are used, but not necessarily what happens with PRNGs of a different design, PRNGs of multiple designs, or PRNGs initialized with unrelated seeds. Moreover, .NET'sSystem.Random
is a poor choice for a PRNG precisely because it allows only seeds no more than 32 bits long (so the number of random sequences it can produce is limited), and also because it has implementation bugs (if I understand correctly) that have been preserved for backward compatibility.
来源:https://stackoverflow.com/questions/58954557/best-practices-for-seeding-random-and-numpy-random-in-the-same-program