MySQL select 10 random rows from 600K rows fast

后端 未结 26 3058
粉色の甜心
粉色の甜心 2020-11-21 05:06

How can I best write a query that selects 10 rows randomly from a total of 600k?

26条回答
  •  时光取名叫无心
    2020-11-21 05:49

    I improved the answer @Riedsio had. This is the most efficient query I can find on a large, uniformly distributed table with gaps (tested on getting 1000 random rows from a table that has > 2.6B rows).

    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
    (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)
    

    Let me unpack what's going on.

    1. @max := (SELECT MAX(id) FROM table)
      • I'm calculating and saving the max. For very large tables, there is a slight overhead for calculating MAX(id) each time you need a row
    2. SELECT FLOOR(rand() * @max) + 1 as rand)
      • Gets a random id
    3. SELECT id FROM table INNER JOIN (...) on id > rand LIMIT 1
      • This fills in the gaps. Basically if you randomly select a number in the gaps, it will just pick the next id. Assuming the gaps are uniformly distributed, this shouldn't be a problem.

    Doing the union helps you fit everything into 1 query so you can avoid doing multiple queries. It also lets you save the overhead of calculating MAX(id). Depending on your application, this might matter a lot or very little.

    Note that this gets only the ids and gets them in random order. If you want to do anything more advanced I recommend you do this:

    SELECT t.id, t.name -- etc, etc
    FROM table t
    INNER JOIN (
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max := (SELECT MAX(id) FROM table)) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1) UNION
        (SELECT id FROM table INNER JOIN (SELECT FLOOR(RAND() * @max) + 1 as rand) r on id > rand LIMIT 1)
    ) x ON x.id = t.id
    ORDER BY t.id
    

提交回复
热议问题