Weighted sampling without replacement

前端未结

关注

 5  1928

没有蜡笔的小新 2021-01-16 14:23

I have a population p of indices and corresponding weights in vector w. I want to get k samples from this population without repla

5条回答

囚心锁ツ (楼主)

2021-01-16 15:05

If you want to select a large fraction of the columns (i.e., k is not very much smaller than n), or if the weights are very skewed, you can use this refinement of Jeff's solution, which ensures that each call to randsample produces samples distinct from the previous ones.

Moreover, it returns the samples in the order in which true sampling without replacement would return them, rather than sorted.

function I=randsample_noreplace(n,k,w) I = randsample(n, k, true, w); while 1 [II, idx] = sort(I); Idup = [false, diff(II)==0]; if ~any(Idup) break else w(I) = 0; %% Don't replace samples Idup (idx) = Idup; %% find duplicates in original list I = [I(~Idup), (randsample(n, sum(Idup), true, w))]; end end

When selecting 29 out of 30 values with uniform weights (the case that gives least benefit), it takes 3 or 4 iterations, compared with 26 without the additional line. If the weights are chosen uniformly, it still takes 3 to 5 iterations compared with around 80 without the additional line.

Also, the number of iterations is bounded by k, however skewed the distribution is.

0 讨论(0)

查看其它5个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复