MySQL select 10 random rows from 600K rows fast

后端 未结 26 2985
粉色の甜心
粉色の甜心 2020-11-21 05:06

How can I best write a query that selects 10 rows randomly from a total of 600k?

相关标签:
26条回答
  • 2020-11-21 05:33

    A great post handling several cases, from simple, to gaps, to non-uniform with gaps.

    http://jan.kneschke.de/projects/mysql/order-by-rand/

    For most general case, here is how you do it:

    SELECT name
      FROM random AS r1 JOIN
           (SELECT CEIL(RAND() *
                         (SELECT MAX(id)
                            FROM random)) AS id)
            AS r2
     WHERE r1.id >= r2.id
     ORDER BY r1.id ASC
     LIMIT 1
    

    This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples

    0 讨论(0)
  • 2020-11-21 05:35

    From book :

    Choose a Random Row Using an Offset

    Still another technique that avoids problems found in the preceding alternatives is to count the rows in the data set and return a random number between 0 and the count. Then use this number as an offset when querying the data set

    $rand = "SELECT ROUND(RAND() * (SELECT COUNT(*) FROM Bugs))";
    $offset = $pdo->query($rand)->fetch(PDO::FETCH_ASSOC);
    $sql = "SELECT * FROM Bugs LIMIT 1 OFFSET :offset";
    $stmt = $pdo->prepare($sql);
    $stmt->execute( $offset );
    $rand_bug = $stmt->fetch();
    

    Use this solution when you can’t assume contiguous key values and you need to make sure each row has an even chance of being selected.

    0 讨论(0)
  • 2020-11-21 05:35

    This is super fast and is 100% random even if you have gaps.

    1. Count the number x of rows that you have available SELECT COUNT(*) as rows FROM TABLE
    2. Pick 10 distinct random numbers a_1,a_2,...,a_10 between 0 and x
    3. Query your rows like this: SELECT * FROM TABLE LIMIT 1 offset a_i for i=1,...,10

    I found this hack in the book SQL Antipatterns from Bill Karwin.

    0 讨论(0)
  • 2020-11-21 05:36

    Here is a game changer that may be helpfully for many;

    I have a table with 200k rows, with sequential id's, I needed to pick N random rows, so I opt to generate random values based in the biggest ID in the table, I created this script to find out which is the fastest operation:

    logTime();
    query("SELECT COUNT(id) FROM tbl");
    logTime();
    query("SELECT MAX(id) FROM tbl");
    logTime();
    query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
    logTime();
    

    The results are:

    • Count: 36.8418693542479 ms
    • Max: 0.241041183472 ms
    • Order: 0.216960906982 ms

    Based in this results, order desc is the fastest operation to get the max id,
    Here is my answer to the question:

    SELECT GROUP_CONCAT(n SEPARATOR ',') g FROM (
        SELECT FLOOR(RAND() * (
            SELECT id FROM tbl ORDER BY id DESC LIMIT 1
        )) n FROM tbl LIMIT 10) a
    
    ...
    SELECT * FROM tbl WHERE id IN ($result);
    

    FYI: To get 10 random rows from a 200k table, it took me 1.78 ms (including all the operations in the php side)

    0 讨论(0)
  • 2020-11-21 05:38

    Well if you have no gaps in your keys and they are all numeric you can calculate random numbers and select those lines. but this will probably not be the case.

    So one solution would be the following:

    SELECT * FROM table WHERE key >= FLOOR(RAND()*MAX(id)) LIMIT 1
    

    which will basically ensure that you get a random number in the range of your keys and then you select the next best which is greater. you have to do this 10 times.

    however this is NOT really random because your keys will most likely not be distributed evenly.

    It's really a big problem and not easy to solve fulfilling all the requirements, MySQL's rand() is the best you can get if you really want 10 random rows.

    There is however another solution which is fast but also has a trade off when it comes to randomness, but may suit you better. Read about it here: How can i optimize MySQL's ORDER BY RAND() function?

    Question is how random do you need it to be.

    Can you explain a bit more so I can give you a good solution.

    For example a company I worked with had a solution where they needed absolute randomness extremely fast. They ended up with pre-populating the database with random values that were selected descending and set to different random values afterwards again.

    If you hardly ever update you could also fill an incrementing id so you have no gaps and just can calculate random keys before selecting... It depends on the use case!

    0 讨论(0)
  • 2020-11-21 05:43

    I needed a query to return a large number of random rows from a rather large table. This is what I came up with. First get the maximum record id:

    SELECT MAX(id) FROM table_name;
    

    Then substitute that value into:

    SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;
    

    Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)

    DELIMITER $$
    
    USE [schema name] $$
    
    DROP PROCEDURE IF EXISTS `random_rows` $$
    
    CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
    BEGIN
    
    SET @t = CONCAT('SET @max=(SELECT MAX(id) FROM ',tab_name,')');
    PREPARE stmt FROM @t;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    
    SET @t = CONCAT(
        'SELECT * FROM ',
        tab_name,
        ' WHERE id>FLOOR(RAND()*@max) LIMIT ',
        num_rows);
    
    PREPARE stmt FROM @t;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    END
    $$
    

    then

    CALL [schema name].random_rows([table name], n);
    
    0 讨论(0)
提交回复
热议问题