How can I best write a query that selects 10 rows randomly from a total of 600k?
One way that i find pretty good if there's an autogenerated id is to use the modulo operator '%'. For Example, if you need 10,000 random records out 70,000, you could simplify this by saying you need 1 out of every 7 rows. This can be simplified in this query:
SELECT * FROM
table
WHERE
id %
FLOOR(
(SELECT count(1) FROM table)
/ 10000
) = 0;
If the result of dividing target rows by total available is not an integer, you will have some extra rows than what you asked for, so you should add a LIMIT clause to help you trim the result set like this:
SELECT * FROM
table
WHERE
id %
FLOOR(
(SELECT count(1) FROM table)
/ 10000
) = 0
LIMIT 10000;
This does require a full scan, but it is faster than ORDER BY RAND, and in my opinion simpler to understand than other options mentioned in this thread. Also if the system that writes to the DB creates sets of rows in batches you might not get such a random result as you where expecting.