问题
Using MATLAB and trying to use a computer cluster to perform 100 repetitions of certain calculation with inherent stochastic nature. Each of those repetitions should include the same code, but with different random seed. It seems that
rng('shuffle')
recommended by documentation may not achieve this if all jobs start running at the same time (on different machines) as the seed used is an integer which seems to be initialized from time (it is monotonously increasing, seems like precision of 100th of a second.
The precision seems reasonable, but "collisions" are still very likely if running 100-1000 instances at the same time, thus corrupting the results statistical interpretation as independent.
Any way to avoid such collisions without manually giving each instance an "instance id" used as seed?
回答1:
Whatever you choose for the seed, it can only take on a 32-bit value, even if it will initialize a generator with a bigger state, such as Mersenne Twister ('twister', 19937 bits). There are certain issues with 32-bit seeds, as discussed in "C++ Seeding Surprises" by M. O'Neill. Presumably, the time-based seeds are likewise 32 bits long. A short seed means that only a limited number of pseudorandom sequences can be generated.
It appears that rng
doesn't support seeds longer than 32 bits. On the other hand, recent versions of MATLAB support random number streams, which are designed, among other things, if you "want separate sources of randomness in a simulation". For your purposes, choose a generator that supports multiple streams, such as mrg32k3a
, and create random number streams as follows (see also "Multiple Streams"):
[stream1, stream2]=RandStream.create('mrg32k3a','NumStreams',2)
回答2:
I usually try to get some serial numbers from the machine or HDD, e.g.
dos('wmic bios get serialnumber')
or
dos('wmic cpu')
ProcessorId e.g. "BFEBFBFF000506E3" is another one that could be used and be different across your cluster. Likely multicores thus use NumberOfCores to split and have different seeds, maybe.
来源:https://stackoverflow.com/questions/62890097/matlab-different-instances-start-with-the-same-random-seed