I have written a method that must consider a random number to simulate a Bernoulli distribution. I am using random.nextDouble
to generate a number between 0 and
According to this post, the best solution is not to put the new scala.util.Random
inside the map, nor completely outside (ie. in the driver code), but in an intermediate mapPartitionsWithIndex
:
import scala.util.Random
val myAppSeed = 91234
val newRDD = myRDD.mapPartitionsWithIndex { (indx, iter) =>
val rand = new scala.util.Random(indx+myAppSeed)
iter.map(x => (x, Array.fill(10)(rand.nextDouble)))
}