Weighted sampling with replacement in Java

守給你的承諾、 提交于 2019-12-10 20:53:53

问题


Is there a function in Java, or in a library such as Apache Commons Math which is equivalent to the MATLAB function randsample? More specifically, I want to find a function randSample which returns a vector of Independent and Identically Distributed random variables according to the probability distribution which I specify. For example:

int[] a = randSample(new int[]{0, 1, 2}, 5, new double[]{0.2, 0.3, 0.5})
//        { 0 w.p. 0.2
// a[i] = { 1 w.p. 0.3
//        { 2 w.p. 0.5

The output is the same as the MATLAB code randsample([0 1 2], 5, true, [0.2 0.3 0.5]) where the true means sampling with replacement.

If such a function does not exist, how do I write one?

Note: I know that a similar question has been asked on Stack Overflow but unfortunately it has not been answered.


回答1:


I'm pretty sure one doesn't exist, but it's pretty easy to make a function that would produce samples like that. First off, Java does come with a random number generator, specifically one with a function, Random.nextDouble() that can produce random doubles between 0.0 and 1.0.

import java.util.Random;

double someRandomDouble = Random.nextDouble();
     // This will be a uniformly distributed
     // random variable between 0.0 and 1.0.

If you have sampling with replacement, if you convert the pdf you have as an input into a cdf, you can use the random doubles Java provides to create a random data set by seeing in which part of the cdf it falls. So first you need to convert the pdf into a cdf.

int [] randsample(int[] values, int numsamples, 
        boolean withReplacement, double [] pdf) {

    if(withReplacement) {
        double[] cdf = new double[pdf.length];
        cdf[0] = pdf[0];
        for(int i=1; i<pdf.length; i++) {
            cdf[i] = cdf[i-1] + pdf[i];
        }

Then you make the properly-sized array of ints to store the result and start finding the random results:

        int[] results = new int[numsamples];
        for(int i=0; i<numsamples; i++) {
            int currentPosition = 0;

            while(randomValue > cdf[currentPosition] && currentPosition < cdf.length) {
                currentPosition++; //Check the next one.
            }

            if(currentPosition < cdf.length) { //It worked!
                results[i] = values[currentPosition];
            } else { //It didn't work.. let's fail gracefully I guess.
                results[i] = values[cdf.length-1]; 
                     // And assign it the last value.
            }
        }

        //Now we're done and can return the results!
        return results;
    } else { //Without replacement.
        throw new Exception("This is unimplemented!");
    }
}

There's some error checking (make sure value array and pdf array are the same size) and some other features you can implement by overloading this to provide the other functions, but hopefully this is enough for you to start. Cheers!



来源:https://stackoverflow.com/questions/20863638/weighted-sampling-with-replacement-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!