问题
I have a list of 10 probabilities (assume these are sorted in descending order): <p1, p2, ..., p10>
. I want to sample (without replacement) 10 elements such that the probability of selecting i-th index is p_i.
Is there a ready to use Java method in common libraries like Random, etc that I could use to do that?
Example: 5-element list: <0.4,0.3,0.2,0.1,0.0>
Select 5 indexes (no duplicates) such that their probability of selection is given by the probability at that index in the list above. So index 0 would be selected with probability 0.4, index 1 selected with prob 0.3 and so on.
I have written my own method to do that but feel that an existing method would be better to use. If you are aware of such a method, please let me know.
回答1:
This is how this is typically done:
static int sample(double[] pdf) {
// Transform your probabilities into a cumulative distribution
double[] cdf = new double[pdf.length];
cdf[0] = pdf[0];
for(int i = 1; i < pdf.length; i++)
cdf[i] += pdf[i] + cdf[i-1];
// Let r be a probability [0,1]
double r = Math.random();
// Search the bin corresponding to that quantile
int k = Arrays.binarySearch(cdf, random.nextDouble());
k = k >= 0 ? k : (-k-1);
return k;
}
If you want to return a probability do:
return pdf[k];
EDIT: I just noticed you say in the title sampling without replacement. This is not so trivial to do fast (I can give you some code I have for that). Anyhow, your question does not make any sense in that case. You cannot sample without replacement from a probability distribution. You need absolute frequencies.
i.e. If I tell you that I have a box filled with two balls: orange and blue with the proportions 20% and 80%. If you do not tell me how many balls you have of each (in absolute terms), I cannot tell you how many balls you will have in a few turns.
EDIT2: A faster version. This is not how it is typically but I have found this suggestion on the web, and I have used it in projects of mine as well.
static int sample(double[] pdf) {
double r = random.nextDouble();
for(int i = 0; i < pdf.length; i++) {
if(r < pdf[i])
return i;
r -= pdf[i];
}
return pdf.length-1; // should not happen
}
To test this:
// javac Test.java && java Test
import java.util.Arrays;
import java.util.Random;
class Test
{
static Random random = new Random();
public static void sample(double[] pdf) {
...
}
public static void main(String[] args) {
double[] pdf = new double[] { 0.3, 0.4, 0.2, 0.1 };
int[] counts = new int[pdf.length];
final int tests = 1000000;
for(int i = 0; i < tests; i++)
counts[sample(pdf)]++;
for(int i = 0; i < counts.length; i++)
System.out.println(counts[i] / (double)tests);
}
}
You can see we get output very similar to the PDF that was used:
0.3001356
0.399643
0.2001143
0.1001071
This are the times I get when running each version:
- 1st version: 0m0.680s
- 2nd version: 0m0.296s
来源:https://stackoverflow.com/questions/29480842/sample-without-replacement-in-java-with-probabilities