Generating Bootstrapped Samples in T-SQL

[亡魂溺海] 提交于 2019-12-24 12:49:16

问题


Does anyone have some ideas on how to generate a bootstrapped sample of rows in T-SQL? The problem with using something like NEWID() or RAND() then just sorting is that you end up generating a random sample without replacement. Bootstrapping should be done by sampling with replacement.


回答1:


Here's one solution. It starts by assigning a row identifier to each row in the population table.

Next, a uniform random integer between 1 and the rowcount of the population, inclusive, is generated using a call to RAND() via a view and the RandMod() function, as described in Question 3531450.

These two queries are JOINed together to create a random sample with replacement.

To get a bootstrapped sample of size K < N, sort on NEWID() and grab the top K rows.

SELECT TOP K 
    key
    , NEWID()
FROM 
(
    SELECT CAST(1 + (SELECT COUNT(*) FROM table))*RandMod() AS int) AS rowchoice
    FROM table
) AS r
LEFT OUTER JOIN
(
    SELECT key
        , ROW_NUMBER() OVER(ORDER BY key) AS rowid
    FROM table
) AS t
ON rowchoice = rowid
ORDER BY 2


来源:https://stackoverflow.com/questions/25451034/generating-bootstrapped-samples-in-t-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!