Getting random results from large tables

不问归期 提交于 2019-12-05 14:19:58

You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.

Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.

A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.

Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.

You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.

Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.

Just some issues to consider.

Working off your random number approach

  • Get the max id in the database.
  • Create a temp table to store your matches.
  • Loop n times doing the following
    • Generate a random number between 1 and maxId
    • Get the first record with a record Id greater than the random number and insert it into your temp table
  • Your temp table now contains your random results.

Or you could dynamically generate sql with a union to do the query in one step.

   SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
   UNION
   SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
   UNION
   SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
   UNION
   SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1

Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound

First you need to get number of rows ... something like this

select count(1) from tbl where category = ? then select a random number

$offset = rand(1,$rowsNum); and select a row with offset

select * FROM tbl LIMIT $offset, 1

in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.

For MySQl you can use

RAND()

SELECT column FROM table
ORDER BY RAND()
LIMIT 4
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!