问题
I need to run a MonteCarlo simulations in parallel on different machines. The code is in c++, but the program is set up and launched with a python script that set a lot of things, in particular the random seed. The function setseed thake a 4 bytes unsigned integer
Using a simple
import time
setseed(int(time.time()))
is not very good because I submit the jobs to a queue on a cluster, they remain pending for some minutes then they starts, but the start time is impredicible, it can be that two jobs start at the same time (seconds), so I switch to:
setseet(int(time.time()*100))
but I'm not happy. What is the best solution? Maybe I can combine information from: time, machine id, process id. Or maybe the best solution is to read from /dev/random (linux machines)?
How to read 4 bytes from /dev/random?
f = open("/dev/random","rb")
f.read(4)
give me a string, I want an integer!
回答1:
Reading from /dev/random
is a good idea. Just convert the 4 byte string into an Integer:
f = open("/dev/random","rb")
rnd_str = f.read(4)
Either using struct:
import struct
rand_int = struct.unpack('I', rnd_string)[0]
Update Uppercase I is needed.
Or multiply and add:
rand_int = 0
for c in rnd_str:
rand_int <<= 8
rand_int += ord(c)
回答2:
You could simply copy over the four bytes into an integer, that should be the least of your worries.
But parallel pseudo-random number generation is a rather complex topic and very often not done well. Usually you generate seeds on one machine and distribute them to the others.
Take a look at SPRNG, which handles exactly your problem.
回答3:
If this is Linux or a similar OS, you want /dev/urandom
-- it always produces data immediately.
/dev/random
may stall waiting for the system to gather randomness. It does produce cryptographic-grade random numbers, but that is overkill for your problem.
回答4:
You can use a random number as the seed, which has the advantage of being operating-system agnostic (no /dev/random needed), with no conversion from string to int:
Why not simply use
random.randrange(-2**31, 2**31)
as the seed of each process? Slightly different starting times give wildly different seeds, this way…
You could also alternatively use the random.jumpahead
method, if you know roughly how many random numbers each process is going to use (the documentation of random.WichmannHill.jumpahead
is useful).
来源:https://stackoverflow.com/questions/2396209/best-seed-for-parallel-process