How to request a random row in SQL?

前端 未结 29 2900
孤城傲影
孤城傲影 2020-11-21 06:45

How can I request a random row (or as close to truly random as is possible) in pure SQL?

相关标签:
29条回答
  • 2020-11-21 07:27

    Best way is putting a random value in a new column just for that purpose, and using something like this (pseude code + SQL):

    randomNo = random()
    execSql("SELECT TOP 1 * FROM MyTable WHERE MyTable.Randomness > $randomNo")
    

    This is the solution employed by the MediaWiki code. Of course, there is some bias against smaller values, but they found that it was sufficient to wrap the random value around to zero when no rows are fetched.

    newid() solution may require a full table scan so that each row can be assigned a new guid, which will be much less performant.

    rand() solution may not work at all (i.e. with MSSQL) because the function will be evaluated just once, and every row will be assigned the same "random" number.

    0 讨论(0)
  • 2020-11-21 07:30

    In SQL Server you can combine TABLESAMPLE with NEWID() to get pretty good randomness and still have speed. This is especially useful if you really only want 1, or a small number, of rows.

    SELECT TOP 1 * FROM [table] 
    TABLESAMPLE (500 ROWS) 
    ORDER BY NEWID()
    
    0 讨论(0)
  • 2020-11-21 07:30

    It seems that many of the ideas listed still use ordering

    However, if you use a temporary table, you are able to assign a random index (like many of the solutions have suggested), and then grab the first one that is greater than an arbitrary number between 0 and 1.

    For example (for DB2):

    WITH TEMP AS (
    SELECT COMLUMN, RAND() AS IDX FROM TABLE)
    SELECT COLUMN FROM TABLE WHERE IDX > .5
    FETCH FIRST 1 ROW ONLY
    
    0 讨论(0)
  • 2020-11-21 07:30

    There is better solution for Oracle instead of using dbms_random.value, while it requires full scan to order rows by dbms_random.value and it is quite slow for large tables.

    Use this instead:

    SELECT *
    FROM employee sample(1)
    WHERE rownum=1
    
    0 讨论(0)
  • 2020-11-21 07:31

    Most of the solutions here aim to avoid sorting, but they still need to make a sequential scan over a table.

    There is also a way to avoid the sequential scan by switching to index scan. If you know the index value of your random row you can get the result almost instantially. The problem is - how to guess an index value.

    The following solution works on PostgreSQL 8.4:

    explain analyze select * from cms_refs where rec_id in 
      (select (random()*(select last_value from cms_refs_rec_id_seq))::bigint 
       from generate_series(1,10))
      limit 1;
    

    I above solution you guess 10 various random index values from range 0 .. [last value of id].

    The number 10 is arbitrary - you may use 100 or 1000 as it (amazingly) doesn't have a big impact on the response time.

    There is also one problem - if you have sparse ids you might miss. The solution is to have a backup plan :) In this case an pure old order by random() query. When combined id looks like this:

    explain analyze select * from cms_refs where rec_id in 
        (select (random()*(select last_value from cms_refs_rec_id_seq))::bigint 
         from generate_series(1,10))
        union all (select * from cms_refs order by random() limit 1)
        limit 1;
    

    Not the union ALL clause. In this case if the first part returns any data the second one is NEVER executed!

    0 讨论(0)
  • 2020-11-21 07:31

    You may also try using new id() function.

    Just write a your query and use order by new id() function. It quite random.

    0 讨论(0)
提交回复
热议问题