I\'ve got a SQL Server table with about 50,000 rows in it. I want to select about 5,000 of those rows at random. I\'ve thought of a complicated way, creating a temp table wi
If you (unlike the OP) need a specific number of records (which makes the CHECKSUM approach difficult) and desire a more random sample than TABLESAMPLE provides by itself, and also want better speed than CHECKSUM, you may make do with a merger of the TABLESAMPLE and NEWID() methods, like this:
DECLARE @sampleCount int = 50
SET STATISTICS TIME ON
SELECT TOP (@sampleCount) *
FROM [yourtable] TABLESAMPLE(10 PERCENT)
ORDER BY NEWID()
SET STATISTICS TIME OFF
In my case this is the most straightforward compromise between randomness (it's not really, I know) and speed. Vary the TABLESAMPLE percentage (or rows) as appropriate - the higher the percentage, the more random the sample, but expect a linear drop off in speed. (Note that TABLESAMPLE will not accept a variable)
It appears newid() can't be used in where clause, so this solution requires an inner query:
SELECT *
FROM (
SELECT *, ABS(CHECKSUM(NEWID())) AS Rnd
FROM MyTable
) vw
WHERE Rnd % 100 < 10 --10%
In MySQL you can do this:
SELECT `PRIMARY_KEY`, rand() FROM table ORDER BY rand() LIMIT 5000;
Try this:
SELECT TOP 10 Field1, ..., FieldN
FROM Table1
ORDER BY NEWID()