Select n random rows from SQL Server table

前端 未结 16 872
陌清茗
陌清茗 2020-11-22 10:54

I\'ve got a SQL Server table with about 50,000 rows in it. I want to select about 5,000 of those rows at random. I\'ve thought of a complicated way, creating a temp table wi

16条回答
  •  难免孤独
    2020-11-22 11:39

    This link have a interesting comparison between Orderby(NEWID()) and other methods for tables with 1, 7, and 13 millions of rows.

    Often, when questions about how to select random rows are asked in discussion groups, the NEWID query is proposed; it is simple and works very well for small tables.

    SELECT TOP 10 PERCENT *
      FROM Table1
      ORDER BY NEWID()
    

    However, the NEWID query has a big drawback when you use it for large tables. The ORDER BY clause causes all of the rows in the table to be copied into the tempdb database, where they are sorted. This causes two problems:

    1. The sorting operation usually has a high cost associated with it. Sorting can use a lot of disk I/O and can run for a long time.
    2. In the worst-case scenario, tempdb can run out of space. In the best-case scenario, tempdb can take up a large amount of disk space that never will be reclaimed without a manual shrink command.

    What you need is a way to select rows randomly that will not use tempdb and will not get much slower as the table gets larger. Here is a new idea on how to do that:

    SELECT * FROM Table1
      WHERE (ABS(CAST(
      (BINARY_CHECKSUM(*) *
      RAND()) as int)) % 100) < 10
    

    The basic idea behind this query is that we want to generate a random number between 0 and 99 for each row in the table, and then choose all of those rows whose random number is less than the value of the specified percent. In this example, we want approximately 10 percent of the rows selected randomly; therefore, we choose all of the rows whose random number is less than 10.

    Please read the full article in MSDN.

提交回复
热议问题