Is this proper use of numpy seeding for parallel code?

独自空忆成欢 提交于 2020-04-30 10:01:03

问题


I am running n instances of the same code in parallel and want each instance to use independent random numbers.

For this purpose, before I start the parallel computations I create a list of random states, like this:

import numpy.random as rand
rand_states = [(rand.seed(rand.randint(2**32-1)),rand.get_state())[1] for j in range(n)]

I then pass one element of rand_states to each parallel process, in which I basically do

rand.set_state(rand_state)
data = rand.rand(10,10)

To make things reproducible, I run np.random.seed(0) at the very beginning of everything.

Does this work like I hope it does? Is this the proper way to achieve it?

(I cannot just store the data arrays themselves beforehand, because (i) there are a lot of places where random numbers are generated in the parallel processes and (ii) that would introduce unnecessary logic coupling between the parallel code and the managing nonparallel code and (iii) in reality I run M slices across N<M processors and the data for all M slices is too big to store)


回答1:


numpy.random.get_state sets the state for the global instance of the NumPy generator. However, each parallel process should use its own instance of a PRNG instead. NumPy 1.17 and later provides a numpy.random.Generator class for this purpose. (In fact, numpy.random.get_state and other numpy.random.* functions are now legacy functions since NumPy 1.17. NumPy's new RNG system was the result of a proposal to change the RNG policy.)

An excellent way to seed multiple processes is to make use of so-called "counter-based" PRNGs (Salmon et al., "Parallel Random Numbers: As Easy as 1, 2, 3", 2011) and other PRNGs that give each seed its own non-overlapping "stream" of random numbers. An example is the bit generator numpy.random.SFC64, newly added in NumPy 1.17.

There are several other strategies for seeding multiple processes, but almost all of them involve having each process use its own PRNG instance rather than sharing a global PRNG instance (as with the legacy numpy.random.* functions). These strategies are explained in my section "Seeding Multiple Processes", which is not NumPy-specific, and the page "Parallel Random Number Generation" in the NumPy documentation.



来源:https://stackoverflow.com/questions/56009927/is-this-proper-use-of-numpy-seeding-for-parallel-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!