How to quickly SELECT 3 random records from a 30k MySQL table with a where filter by a single query?

后端 未结 6 1871
生来不讨喜
生来不讨喜 2020-12-14 02:38

Well, this is a very old question never gotten real solution. We want 3 random rows from a table with about 30k records. The table is not so big in point of view MySQL, but

相关标签:
6条回答
  • 2020-12-14 03:05

    Ugly, but quick and random. Can become very ugly very fast, especially with tuning described below, so make sure you really want it this way.

    (SELECT Products.ID, Products.Name
    FROM Products
        INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
    WHERE Products.HasImages=1
    ORDER BY Products.ID
    LIMIT 1)
    
    UNION ALL
    
    (SELECT Products.ID, Products.Name
    FROM Products
        INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
    WHERE Products.HasImages=1
    ORDER BY Products.ID
    LIMIT 1)
    
    UNION ALL
    
    (SELECT Products.ID, Products.Name
    FROM Products
        INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
    WHERE Products.HasImages=1
    ORDER BY Products.ID
    LIMIT 1)
    

    First row appears more often than it should

    If you have big gaps between IDs in your table, rows right after such gaps will have bigger chance to be fetched by this query. In some cases, they will appear significatnly more often than they should. This can not be solved in general, but there's a fix for a common particular case: when there's a gap between 0 and the first existing ID in a table.

    Instead of subquery (SELECT RAND()*<max_id> AS ID) use something like (SELECT <min_id> + RAND()*(<max_id> - <min_id>) AS ID)

    Remove duplicates

    The query, if used as is, may return duplicate rows. It is possible to avoid that by using UNION instead of UNION ALL. This way duplicates will be merged, but the query no longer guarantees to return exactly 3 rows. You can work around that too, by fetching more rows than you need and limiting the outer result like this:

    (SELECT ... LIMIT 1)
    UNION (SELECT ... LIMIT 1)
    UNION (SELECT ... LIMIT 1)
    ...
    UNION (SELECT ... LIMIT 1)
    LIMIT 3
    

    There's still no guarantee that 3 rows will be fetched, though. It just makes it more likely.

    0 讨论(0)
  • 2020-12-14 03:09
    SELECT Products.ID, Products.Name
    FROM Products
    INNER JOIN (SELECT (RAND() * (SELECT MAX(ID) FROM Products)) AS ID) AS t ON Products.ID     >= t.ID
    WHERE (Products.HasImages=1)
    ORDER BY Products.ID ASC
    LIMIT 3;
    

    Of course the above is given "near" contiguous records you are feeding it the same ID every time without much regard to the seed of the rand function.

    This should give more "randomness"

    SELECT Products.ID, Products.Name
    FROM Products
    INNER JOIN (SELECT (ROUND((RAND() * (max-min))+min)) AS ID) AS t ON Products.ID     >= t.ID
    WHERE (Products.HasImages=1)
    ORDER BY Products.ID ASC
    LIMIT 3;
    

    Where max and min are two values you choose, lets say for example sake:

    max = select max(id)
    min = 225
    
    0 讨论(0)
  • 2020-12-14 03:09

    This statement executes really fast (19 ms on a 30k records table):

    $db = new PDO('mysql:host=localhost;dbname=database;charset=utf8', 'username', 'password');
    $stmt = $db->query("SELECT p.ID, p.Name, p.HasImages
                        FROM (SELECT @count := COUNT(*) + 1, @limit := 3 FROM Products WHERE HasImages = 1) vars
                        STRAIGHT_JOIN (SELECT t.*, @limit := @limit - 1 FROM Products t WHERE t.HasImages = 1 AND (@count := @count -1) AND RAND() < @limit / @count) p");
    $products = $stmt->fetchAll(PDO::FETCH_ASSOC);
    

    The Idea is to "inject" a new column with randomized values, and then sort by this column. The generation of and sorting by this injected column is way faster than the "ORDER BY RAND()" command.

    There "might" be one caveat: You have to include the WHERE query twice.

    0 讨论(0)
  • 2020-12-14 03:13

    What about creating another table containing only items with image ? This table will be much lighter as it will contain only one-third of the items the original table has !

    ------------------------------------------
    |ID     | Item ID (on the original table)|
    ------------------------------------------
    |0      | 0                              |
    ------------------------------------------
    |1      | 123                            |
    ------------------------------------------
                .
                .
                .
    ------------------------------------------
    |10 000 | 30 000                         |
    ------------------------------------------
    

    You can then generate three random IDs in the PHP part of the code and just fetch'em the from the database.

    0 讨论(0)
  • 2020-12-14 03:25

    On the off-chance that you're willing to accept an 'outside the box' type of answer, I'm going to repeat what I said in some of the comments.

    The best way to approach your problem is to cache your data in advance (be that in an external JSON or XML file, or in a separate database table, possibly even an in-memory table).

    This way you can schedule your performance-hit on the products table to times when you know the server will be quiet, and reduce your worry about creating a performance hit at "random" times when the visitor arrives to your site.

    I'm not going to suggest an explicit solution, because there are far too many possibilities on how to build a solution. However, the answer suggested by @ahmed is not silly. If you don't want to create a join in your query, then simply load more of the data that you require into the new table instead.

    0 讨论(0)
  • 2020-12-14 03:27

    I've been testing the following bunch of SQLs on a 10M-record, poorly designed database.

    SELECT COUNT(ID)
    INTO @count
    FROM Products
    WHERE HasImages = 1;
    
    PREPARE random_records FROM
    '(
        SELECT * FROM Products WHERE HasImages = 1 LIMIT ?, 1
    ) UNION (
        SELECT * FROM Products WHERE HasImages = 1 LIMIT ?, 1
    ) UNION (
        SELECT * FROM Products WHERE HasImages = 1 LIMIT ?, 1
    )';
    
    SET @l1 = ROUND(RAND() * @count);
    SET @l2 = ROUND(RAND() * @count);
    SET @l3 = ROUND(RAND() * @count);
    
    EXECUTE random_records USING @l1
        , @l2
        , @l3;
    DEALLOCATE PREPARE random_records;
    

    It took almost 7 minutes to get the three results. But I'm sure its performance will be much better in your case. Yet if you are looking for a better performance I suggest the following ones as they took less than 30 seconds for me to get the job done (on the same database).

    SELECT COUNT(ID)
    INTO @count
    FROM Products
    WHERE HasImages = 1;
    
    PREPARE random_records FROM
    'SELECT * FROM Products WHERE HasImages = 1 LIMIT ?, 1';
    
    SET @l1 = ROUND(RAND() * @count);
    SET @l2 = ROUND(RAND() * @count);
    SET @l3 = ROUND(RAND() * @count);
    
    EXECUTE random_records USING @l1;
    EXECUTE random_records USING @l2;
    EXECUTE random_records USING @l3;
    
    DEALLOCATE PREPARE random_records;
    

    Bear in mind that both these commands require MySQLi driver in PHP if you want to execute them in one go. And their only difference is that the later one requires calling MySQLi's next_result method to retrieve all three results.

    My personal belief is that this is the fastest way to do this.

    0 讨论(0)
提交回复
热议问题