How can I best write a query that selects 10 rows randomly from a total of 600k?
Combine the answer of @redsio with a temp-table (600K is not that much):
DROP TEMPORARY TABLE IF EXISTS tmp_randorder;
CREATE TABLE tmp_randorder (id int(11) not null auto_increment primary key, data_id int(11));
INSERT INTO tmp_randorder (data_id) select id from datatable;
And then take a version of @redsios Answer:
SELECT dt.*
FROM
(SELECT (RAND() *
(SELECT MAX(id)
FROM tmp_randorder)) AS id)
AS rnd
INNER JOIN tmp_randorder rndo on rndo.id between rnd.id - 10 and rnd.id + 10
INNER JOIN datatable AS dt on dt.id = rndo.data_id
ORDER BY abs(rndo.id - rnd.id)
LIMIT 1;
If the table is big, you can sieve on the first part:
INSERT INTO tmp_randorder (data_id) select id from datatable where rand() < 0.01;
Version: You could keep the table tmp_randorder
persistent, call it datatable_idlist. Recreate that table in certain intervals (day, hour), since it also will get holes. If your table gets really big, you could also refill holes
select l.data_id as whole from datatable_idlist l left join datatable dt on dt.id = l.data_id where dt.id is null;
Version: Give your Dataset a random_sortorder column either directly in datatable or in a persistent extra table datatable_sortorder
. Index that column. Generate a Random-Value in your Application (I'll call it $rand
).
select l.*
from datatable l
order by abs(random_sortorder - $rand) desc
limit 1;
This solution discriminates the 'edge rows' with the highest and the lowest random_sortorder, so rearrange them in intervals (once a day).