How can I best write a query that selects 10 rows randomly from a total of 600k?
Combine the answer of @redsio with a temp-table (600K is not that much):
DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;
And then take a version of @redsios Answer:
SELECT dt.*
FROM
(SELECT (RAND() *
(SELECT MAX(id)
FROM tmp_randorder)) AS id)
AS rnd
INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
INNER JOIN datatable AS dt on dt.id = rndo.data_id
ORDER BY abs(rndo.id - rnd.id)
LIMIT 1;
If the table is big, you can sieve on the first part:
INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;
Version: You could keep the table tmp_randorder
persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes
select l.data_id as whole from datatable_idlist l left join datatable dt on dt.id = l.data_id where dt.id is null;
Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder
. Index that column. Generate a Random-Value in your Application (I'll call it $rand
).
select l.*
from datatable l
order by abs(random_sortorder - $rand) desc
limit 1;
This solution discriminates the 'edge rows' with the highest and the lowest random_sortorder, so rearrange them in intervals (once a day).
SELECT column FROM table
ORDER BY RAND()
LIMIT 10
Not the efficient solution but works
Another simple solution would be ranking the rows and fetch one of them randomly and with this solution you won't need to have any 'Id' based column in the table.
SELECT d.* FROM (
SELECT t.*, @rownum := @rownum + 1 AS rank
FROM mytable AS t,
(SELECT @rownum := 0) AS r,
(SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM mytable))) AS n
) d WHERE rank >= @cnt LIMIT 10;
You can change the limit value as per your need to access as many rows as you want but that would mostly be consecutive values.
However, if you don't want consecutive random values then you can fetch a bigger sample and select randomly from it. something like ...
SELECT * FROM (
SELECT d.* FROM (
SELECT c.*, @rownum := @rownum + 1 AS rank
FROM buildbrain.`commits` AS c,
(SELECT @rownum := 0) AS r,
(SELECT @cnt := (SELECT RAND() * (SELECT COUNT(*) FROM buildbrain.`commits`))) AS rnd
) d
WHERE rank >= @cnt LIMIT 10000
) t ORDER BY RAND() LIMIT 10;
The following should be fast, unbiased and independent of id column. However it does not guarantee that the number of rows returned will match the number of rows requested.
SELECT *
FROM t
WHERE RAND() < (SELECT 10 / COUNT(*) FROM t)
Explanation: assuming you want 10 rows out of 100 then each row has 1/10 probability of getting SELECTed which could be achieved by WHERE RAND() < 0.1
. This approach does not guarantee 10 rows; but if the query is run enough times the average number of rows per execution will be around 10 and each row in the table will be selected evenly.
I've looked through all of the answers, and I don't think anyone mentions this possibility at all, and I'm not sure why.
If you want utmost simplicity and speed, at a minor cost, then to me it seems to make sense to store a random number against each row in the DB. Just create an extra column, random_number
, and set it's default to RAND()
. Create an index on this column.
Then when you want to retrieve a row generate a random number in your code (PHP, Perl, whatever) and compare that to the column.
SELECT FROM tbl WHERE random_number >= :random LIMIT 1
I guess although it's very neat for a single row, for ten rows like the OP asked you'd have to call it ten separate times (or come up with a clever tweak that escapes me immediately)
Its very simple and single line query.
SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;