问题
Does anyone have some ideas on how to generate a bootstrapped sample of rows in T-SQL? The problem with using something like NEWID()
or RAND()
then just sorting is that you end up generating a random sample without replacement. Bootstrapping should be done by sampling with replacement.
回答1:
Here's one solution. It starts by assigning a row identifier to each row in the population table.
Next, a uniform random integer between 1 and the rowcount of the population, inclusive, is generated using a call to RAND() via a view and the RandMod() function, as described in Question 3531450.
These two queries are JOINed together to create a random sample with replacement.
To get a bootstrapped sample of size K < N, sort on NEWID() and grab the top K rows.
SELECT TOP K
key
, NEWID()
FROM
(
SELECT CAST(1 + (SELECT COUNT(*) FROM table))*RandMod() AS int) AS rowchoice
FROM table
) AS r
LEFT OUTER JOIN
(
SELECT key
, ROW_NUMBER() OVER(ORDER BY key) AS rowid
FROM table
) AS t
ON rowchoice = rowid
ORDER BY 2
来源:https://stackoverflow.com/questions/25451034/generating-bootstrapped-samples-in-t-sql