How can I best write a query that selects 10 rows randomly from a total of 600k?
The following should be fast, unbiased and independent of id column. However it does not guarantee that the number of rows returned will match the number of rows requested.
SELECT *
FROM t
WHERE RAND() < (SELECT 10 / COUNT(*) FROM t)
Explanation: assuming you want 10 rows out of 100 then each row has 1/10 probability of getting SELECTed which could be achieved by WHERE RAND() < 0.1
. This approach does not guarantee 10 rows; but if the query is run enough times the average number of rows per execution will be around 10 and each row in the table will be selected evenly.