What is a fast way to select a random row from a large mysql table?
I\'m working in php, but I\'m interested in any solution even if it\'s in another language.
I'm a bit new to SQL but how about generating a random number in PHP and using
SELECT * FROM the_table WHERE primary_key >= $randNr
this doesn't solve the problem with holes in the table.
But here's a twist on lassevks suggestion:
SELECT primary_key FROM the_table
Use mysql_num_rows() in PHP create a random number based on the above result:
SELECT * FROM the_table WHERE primary_key = rand_number
On a side note just how slow is SELECT * FROM the_table
:
Creating a random number based on mysql_num_rows()
and then moving the data pointer to that point mysql_data_seek()
. Just how slow will this be on large tables with say a million rows?
In my case my table has an id as primary key, auto-increment with no gaps, so I can use COUNT(*)
or MAX(id)
to get the number of rows.
I made this script to test the fastest operation:
logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();
The results are:
36.8418693542479 ms
0.241041183472 ms
0.216960906982 ms
Answer with the order method:
SELECT FLOOR(RAND() * (
SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 1
...
SELECT * FROM tbl WHERE id = $result;
In order to find random rows from a table, don’t use ORDER BY RAND() because it forces MySQL to do a full file sort and only then to retrieve the limit rows number required. In order to avoid this full file sort, use the RAND() function only at the where clause. It will stop as soon as it reaches to the required number of rows. See http://www.rndblog.com/how-to-select-random-rows-in-mysql/
SET @COUNTER=SELECT COUNT(*) FROM your_table;
SELECT PrimaryKey
FROM your_table
LIMIT 1 OFFSET (RAND() * @COUNTER);
The complexity of the first query is O(1) for MyISAM tables.
The second query accompanies a table full scan. Complexity = O(n)
Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.
CREATE TABLE Aux(
MyPK INT AUTO_INCREMENT,
PrimaryKey INT
);
SET @MaxPK = (SELECT MAX(MyPK) FROM Aux);
SET @RandPK = CAST(RANDOM() * @MaxPK, INT)
SET @PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = @RandPK);
If DELETEs are allowed,
SET @delta = CAST(@RandPK/10, INT);
SET @PrimaryKey = (SELECT PrimaryKey
FROM Aux
WHERE MyPK BETWEEN @RandPK - @delta AND @RandPK + @delta
LIMIT 1);
The overall complexity is O(1).
There is another way to produce random rows using only a query and without order by rand(). It involves User Defined Variables. See how to produce random rows from a table
With a order yo will do a full scan table. Its best if you do a select count(*) and later get a random row=rownum between 0 and the last registry