random-sample

Simple Random Samples from a Sql database

Deadly 提交于 2019-11-26 03:08:59
问题 How do I take an efficient simple random sample in SQL? The database in question is running MySQL; my table is at least 200,000 rows, and I want a simple random sample of about 10,000. The \"obvious\" answer is to: SELECT * FROM table ORDER BY RAND() LIMIT 10000 For large tables, that\'s too slow: it calls RAND() for every row (which already puts it at O(n)), and sorts them, making it O(n lg n) at best. Is there a way to do this faster than O(n)? Note : As Andrew Mao points out in the

Weighted random selection with and without replacement

久未见 提交于 2019-11-26 01:39:21
问题 Recently I needed to do weighted random selection of elements from a list, both with and without replacement. While there are well known and good algorithms for unweighted selection, and some for weighted selection without replacement (such as modifications of the resevoir algorithm), I couldn\'t find any good algorithms for weighted selection with replacement. I also wanted to avoid the resevoir method, as I was selecting a significant fraction of the list, which is small enough to hold in