I\'m trying to run several instances of a piece of code (2000 instances or so) concurrently in a computing cluster. The way it works is that I submit the jobs and the clust
If uniqueness is important, you need to arrange for each node to know what IDs have been claimed by others. You could do this with a protocol asking "anyone claimed ID x?" or arranging in advance for each node to have a selection of IDs which have not been allocated to others.
(GUIDs use the machine's MAC, so would fall into the "arrange in advance" category.)
Without some form of agreement, you'll risk two nodes climing the same ID.
Assuming you're on a reasonably POSIX-ish system, you should have clock_gettime
. This will give the current time in nanoseconds, which means for all practical purposes it's impossible to ever get the same value twice. (In theory bad implementations could have much lower resolution, e.g. just multiplying milliseconds by 1 million, but even half-decent systems like Linux give real nanosecond results.)
I assume you have some process launching the other processes. Have it pass in the seed to use. Then you can have that master process just pass in a random number for each process to use as its seed. That way there's really only one arbitrary seed chosen... you can use time for that.
If you don't have a master process launching the others, then if each process at least has a unique index, then what you can do is have one process generate a series of random numbers in memory (if shared memory) or in a file (if shared disk) and then have each process pull the index'th random number out to use as their seed.
Nothing will give you a more even distribution of seeds than a series of random numbers from a single seed.
The rdtsc
instruction is a pretty reliable (and random) seed.
In Windows it's accessible via the __rdtsc()
intrinsic.
In GNU C, it's accessible via:
unsigned long long rdtsc(){
unsigned int lo,hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return ((unsigned long long)hi << 32) | lo;
}
The instruction measures the total pseudo-cycles since the processor was powered on. Given the high frequency of today's machines, it's extremely unlikely that two processors will return the same value even if they booted at the same time and are clocked at the same speed.