MySQL select 10 random rows from 600K rows fast

后端未结

关注

 26  2986

粉色の甜心

How can I best write a query that selects 10 rows randomly from a total of 600k?

相关标签:

26条回答

栀梦

2020-11-21 05:43
If you have just one Read-Request

Combine the answer of @redsio with a temp-table (600K is not that much):
```
DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;
```
And then take a version of @redsios Answer:
```
SELECT dt.*
FROM
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM tmp_randorder)) AS id)
        AS rnd
 INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
 INNER JOIN datatable AS dt on dt.id = rndo.data_id
 ORDER BY abs(rndo.id - rnd.id)
 LIMIT 1;
```
If the table is big, you can sieve on the first part:
```
INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;
```
If you have many read-requests
1. Version: You could keep the table tmp_randorder persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes
  
  select l.data_id as whole from datatable_idlist l left join datatable dt on dt.id = l.data_id where dt.id is null;
2. Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder. Index that column. Generate a Random-Value in your Application (I'll call it $rand).
```
select l.*
from datatable l 
order by abs(random_sortorder - $rand) desc 
limit 1;
```
This solution discriminates the 'edge rows' with the highest and the lowest random_sortorder, so rearrange them in intervals (once a day).
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-11-21 05:45
```
SELECT column FROM table
ORDER BY RAND()
LIMIT 10
```
Not the efficient solution but works
0 讨论(0)
发布评论:

提交评论
- 加载中...

被撕碎了的回忆

2020-11-21 05:47

Another simple solution would be ranking the rows and fetch one of them randomly and with this solution you won't need to have any 'Id' based column in the table.

SELECT d.* FROM (
SELECT  t.*,  @rownum := @rownum + 1 AS rank
FROM mytable AS t,
    (SELECT @rownum := 0) AS r,
    (SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM mytable))) AS n
) d WHERE rank >= @cnt LIMIT 10;

You can change the limit value as per your need to access as many rows as you want but that would mostly be consecutive values.

However, if you don't want consecutive random values then you can fetch a bigger sample and select randomly from it. something like ...

SELECT * FROM (
SELECT d.* FROM (
    SELECT  c.*,  @rownum := @rownum + 1 AS rank
    FROM buildbrain.`commits` AS c,
        (SELECT @rownum := 0) AS r,
        (SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM buildbrain.`commits`))) AS rnd
) d 
WHERE rank >= @cnt LIMIT 10000 
) t ORDER BY RAND() LIMIT 10;

0 讨论(0)

情歌与酒

2020-11-21 05:47
The following should be fast, unbiased and independent of id column. However it does not guarantee that the number of rows returned will match the number of rows requested.
```
SELECT *
FROM t
WHERE RAND() < (SELECT 10 / COUNT(*) FROM t)
```
Explanation: assuming you want 10 rows out of 100 then each row has 1/10 probability of getting SELECTed which could be achieved by WHERE RAND() < 0.1. This approach does not guarantee 10 rows; but if the query is run enough times the average number of rows per execution will be around 10 and each row in the table will be selected evenly.
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2020-11-21 05:48
I've looked through all of the answers, and I don't think anyone mentions this possibility at all, and I'm not sure why.

If you want utmost simplicity and speed, at a minor cost, then to me it seems to make sense to store a random number against each row in the DB. Just create an extra column, random_number, and set it's default to RAND(). Create an index on this column.

Then when you want to retrieve a row generate a random number in your code (PHP, Perl, whatever) and compare that to the column.
```
SELECT FROM tbl WHERE random_number >= :random LIMIT 1
```
I guess although it's very neat for a single row, for ten rows like the OP asked you'd have to call it ten separate times (or come up with a clever tweak that escapes me immediately)
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-11-21 05:49
Its very simple and single line query.
```
SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

MySQL select 10 random rows from 600K rows fast

If you have just one Read-Request

If you have many read-requests