I have very simple cases where the work to be done can be broken up and distributed among workers. I tried a very simple multiprocessing example from here:
i
This blog post provides an example of a good and bad practise when using numpy.random and multi-processing. The more important is to understand when the seed of your pseudo random number generator (PRNG) is created:
import numpy as np
import pprint
from multiprocessing import Pool
pp = pprint.PrettyPrinter()
def bad_practice(index):
return np.random.randint(0,10,size=10)
def good_practice(index):
return np.random.RandomState().randint(0,10,size=10)
p = Pool(5)
pp.pprint("Bad practice: ")
pp.pprint(p.map(bad_practice, range(5)))
pp.pprint("Good practice: ")
pp.pprint(p.map(good_practice, range(5)))
output:
'Bad practice: '
[array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9]),
array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9]),
array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9]),
array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9]),
array([4, 2, 8, 0, 1, 1, 6, 1, 2, 9])]
'Good practice: '
[array([8, 9, 4, 5, 1, 0, 8, 1, 5, 4]),
array([5, 1, 3, 3, 3, 0, 0, 1, 0, 8]),
array([1, 9, 9, 9, 2, 9, 4, 3, 2, 1]),
array([4, 3, 6, 2, 6, 1, 2, 9, 5, 2]),
array([6, 3, 5, 9, 7, 1, 7, 4, 8, 5])]
In the good practice the seed is created once per thread while in the bad practise the seed is created only once when you import the numpy.random module.
I think you'll need to re-seed the random number generator using numpy.random.seed in your do_calculation
function.
My guess is that the random number generator (RNG) gets seeded when you import the module. Then, when you use multiprocessing, you fork the current process with the RNG already seeded -- Thus, all your processes are sharing the same seed value for the RNG and so they'll generate the same sequences of numbers.
e.g.:
def do_calculation(data):
np.random.seed()
rand=np.random.randint(10)
print data, rand
return data * 2