Spark - Random Number Generation

后端 未结 4 1644
旧时难觅i
旧时难觅i 2021-01-02 00:01

I have written a method that must consider a random number to simulate a Bernoulli distribution. I am using random.nextDouble to generate a number between 0 and

4条回答
  •  醉梦人生
    2021-01-02 00:51

    According to this post, the best solution is not to put the new scala.util.Random inside the map, nor completely outside (ie. in the driver code), but in an intermediate mapPartitionsWithIndex:

    import scala.util.Random
    val myAppSeed = 91234
    val newRDD = myRDD.mapPartitionsWithIndex { (indx, iter) =>
       val rand = new scala.util.Random(indx+myAppSeed)
       iter.map(x => (x, Array.fill(10)(rand.nextDouble)))
    }
    

提交回复
热议问题