Best seed for parallel process

偶尔善良 提交于 2020-01-03 16:45:34

问题


I need to run a MonteCarlo simulations in parallel on different machines. The code is in c++, but the program is set up and launched with a python script that set a lot of things, in particular the random seed. The function setseed thake a 4 bytes unsigned integer

Using a simple

import time
setseed(int(time.time()))

is not very good because I submit the jobs to a queue on a cluster, they remain pending for some minutes then they starts, but the start time is impredicible, it can be that two jobs start at the same time (seconds), so I switch to:

setseet(int(time.time()*100))

but I'm not happy. What is the best solution? Maybe I can combine information from: time, machine id, process id. Or maybe the best solution is to read from /dev/random (linux machines)?

How to read 4 bytes from /dev/random?

f = open("/dev/random","rb")
f.read(4)

give me a string, I want an integer!


回答1:


Reading from /dev/random is a good idea. Just convert the 4 byte string into an Integer:

f = open("/dev/random","rb")
rnd_str = f.read(4)

Either using struct:

import struct
rand_int = struct.unpack('I', rnd_string)[0]

Update Uppercase I is needed.

Or multiply and add:

rand_int = 0
for c in rnd_str:
    rand_int <<= 8
    rand_int += ord(c)



回答2:


You could simply copy over the four bytes into an integer, that should be the least of your worries.

But parallel pseudo-random number generation is a rather complex topic and very often not done well. Usually you generate seeds on one machine and distribute them to the others.

Take a look at SPRNG, which handles exactly your problem.




回答3:


If this is Linux or a similar OS, you want /dev/urandom -- it always produces data immediately.

/dev/random may stall waiting for the system to gather randomness. It does produce cryptographic-grade random numbers, but that is overkill for your problem.




回答4:


You can use a random number as the seed, which has the advantage of being operating-system agnostic (no /dev/random needed), with no conversion from string to int:

Why not simply use

random.randrange(-2**31, 2**31)

as the seed of each process? Slightly different starting times give wildly different seeds, this way…

You could also alternatively use the random.jumpahead method, if you know roughly how many random numbers each process is going to use (the documentation of random.WichmannHill.jumpahead is useful).



来源:https://stackoverflow.com/questions/2396209/best-seed-for-parallel-process

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!