I\'ve got a SQL Server table with about 50,000 rows in it. I want to select about 5,000 of those rows at random. I\'ve thought of a complicated way, creating a temp table wi
Didn't quite see this variation in the answers yet. I had an additional constraint where I needed, given an initial seed, to select the same set of rows each time.
For MS SQL:
Minimum example:
select top 10 percent *
from table_name
order by rand(checksum(*))
Normalized execution time: 1.00
NewId() example:
select top 10 percent *
from table_name
order by newid()
Normalized execution time: 1.02
NewId()
is insignificantly slower than rand(checksum(*))
, so you may not want to use it against large record sets.
Selection with Initial Seed:
declare @seed int
set @seed = Year(getdate()) * month(getdate()) /* any other initial seed here */
select top 10 percent *
from table_name
order by rand(checksum(*) % @seed) /* any other math function here */
If you need to select the same set given a seed, this seems to work.