Select n random rows from SQL Server table

前端 未结 16 842
陌清茗
陌清茗 2020-11-22 10:54

I\'ve got a SQL Server table with about 50,000 rows in it. I want to select about 5,000 of those rows at random. I\'ve thought of a complicated way, creating a temp table wi

相关标签:
16条回答
  • 2020-11-22 11:44

    If you (unlike the OP) need a specific number of records (which makes the CHECKSUM approach difficult) and desire a more random sample than TABLESAMPLE provides by itself, and also want better speed than CHECKSUM, you may make do with a merger of the TABLESAMPLE and NEWID() methods, like this:

    DECLARE @sampleCount int = 50
    SET STATISTICS TIME ON
    
    SELECT TOP (@sampleCount) * 
    FROM [yourtable] TABLESAMPLE(10 PERCENT)
    ORDER BY NEWID()
    
    SET STATISTICS TIME OFF
    

    In my case this is the most straightforward compromise between randomness (it's not really, I know) and speed. Vary the TABLESAMPLE percentage (or rows) as appropriate - the higher the percentage, the more random the sample, but expect a linear drop off in speed. (Note that TABLESAMPLE will not accept a variable)

    0 讨论(0)
  • 2020-11-22 11:45

    It appears newid() can't be used in where clause, so this solution requires an inner query:

    SELECT *
    FROM (
        SELECT *, ABS(CHECKSUM(NEWID())) AS Rnd
        FROM MyTable
    ) vw
    WHERE Rnd % 100 < 10        --10%
    
    0 讨论(0)
  • 2020-11-22 11:49

    In MySQL you can do this:

    SELECT `PRIMARY_KEY`, rand() FROM table ORDER BY rand() LIMIT 5000;
    
    0 讨论(0)
  • 2020-11-22 11:52

    Try this:

    SELECT TOP 10 Field1, ..., FieldN
    FROM Table1
    ORDER BY NEWID()
    
    0 讨论(0)
提交回复
热议问题