MySQL: Alternatives to ORDER BY RAND()

前端 未结 7 1773
情歌与酒
情歌与酒 2020-11-22 02:46

I\'ve read about a few alternatives to MySQL\'s ORDER BY RAND() function, but most of the alternatives apply only to where on a single random result is needed.

相关标签:
7条回答
  • 2020-11-22 03:13

    Create a column or join to a select with random numbers (generated in for example php) and order by this column.

    0 讨论(0)
  • 2020-11-22 03:15

    Here's an alternative, but it is still based on using RAND():

      SELECT u.id, 
             p.photo,
             ROUND(RAND() * x.m_id) 'rand_ind'
        FROM users u, 
             profiles p,
             (SELECT MAX(t.id) 'm_id'
                FROM USERS t) x
       WHERE p.memberid = u.id 
         AND p.photo != '' 
         AND (u.ownership=1 OR u.stamp=1) 
    ORDER BY rand_ind
       LIMIT 18
    

    This is slightly more complex, but gave a better distribution of random_ind values:

      SELECT u.id, 
             p.photo,
             FLOOR(1 + RAND() * x.m_id) 'rand_ind'
        FROM users u, 
             profiles p,
             (SELECT MAX(t.id) - 1 'm_id'
                FROM USERS t) x
       WHERE p.memberid = u.id 
         AND p.photo != '' 
         AND (u.ownership=1 OR u.stamp=1) 
    ORDER BY rand_ind
       LIMIT 18
    
    0 讨论(0)
  • 2020-11-22 03:18

    The solution I am using is also posted in the link below: How can i optimize MySQL's ORDER BY RAND() function?

    I am assuming your users table is going to be larger than your profiles table, if not then it's 1 to 1 cardinality.

    If so, I would first do a random selection on user table before joining with profile table.

    First do selection:

    SELECT *
    FROM users
    WHERE users.ownership = 1 OR users.stamp = 1
    

    Then from this pool, pick out random rows through calculated probability. If your table has M rows and you want to pick out N random rows, the probability of random selection should be N/M. Hence:

    SELECT *
    FROM
    (
        SELECT *
        FROM users
        WHERE users.ownership = 1 OR users.stamp = 1
    ) as U
    WHERE 
        rand() <= $limitCount / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)
    

    Where N is $limitCount and M is the subquery that calculates the table row count. However, since we are working on probability, it is possible to have LESS than $limitCount of rows returned. Therefore we should multiply N by a factor to increase the random pool size.

    i.e:

    SELECT*
    FROM
    (
        SELECT *
        FROM users
        WHERE users.ownership = 1 OR users.stamp = 1
    ) as U
    WHERE 
        rand() <= $limitCount * $factor / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)
    

    I usually set $factor = 2. You can set the factor to a lower value to further reduce the random pool size (e.g. 1.5).

    At this point, we would have already limited a M size table down to roughly 2N size. From here we can do a JOIN then LIMIT.

    SELECT * 
    FROM
    (
           SELECT *
            FROM
            (
                SELECT *
                FROM users
                WHERE users.ownership = 1 OR users.stamp = 1
            ) as U
            WHERE 
                rand() <= $limitCount * $factor / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)
    ) as randUser
    JOIN profiles
    ON randUser.id = profiles.memberid AND profiles.photo != ''
    LIMIT $limitCount
    

    On a large table, this query will outperform a normal ORDER by RAND() query.

    Hope this helps!

    0 讨论(0)
  • 2020-11-22 03:20

    Order by rand() is very slow on large tables,

    I found the following workaround in a php script:

    Select min(id) as min, max(id) as max from table;
    

    Then do random in php

    $rand = rand($min, $max);
    

    Then

    'Select * from table where id>'.$rand.' limit 1';
    

    Seems to be quite fast....

    0 讨论(0)
  • 2020-11-22 03:23

    It is not the fastest, but faster then common ORDER BY RAND() way:

    ORDER BY RAND() is not so slow, when you use it to find only indexed column. You can take all your ids in one query like this:

    SELECT id
    FROM testTable
    ORDER BY RAND();
    

    to get a sequence of random ids, and JOIN the result to another query with other SELECT or WHERE parameters:

    SELECT t.*
    FROM testTable t
    JOIN
        (SELECT id
        FROM `testTable`
        ORDER BY RAND()) AS z ON z.id= t.id   
    WHERE t.isVisible = 1
    LIMIT 100; 
    

    in your case it would be:

    SELECT u.id, p.photo 
    FROM users u, profiles p 
    JOIN
        (SELECT id
        FROM users
        ORDER BY RAND()) AS z ON z.id = u.id   
    WHERE p.memberid = u.id 
      AND p.photo != '' 
      AND (u.ownership=1 OR u.stamp=1) 
    LIMIT 18 
    

    It's very blunt method and it can be not proper with very big tables, but still it's faster than common RAND(). I got 20 times faster execution time searching 3000 random rows in almost 400000.

    0 讨论(0)
  • 2020-11-22 03:28

    UPDATE 2016

    This solution works best using an indexed column.

    Here is a simple example of and optimized query bench marked with 100,000 rows.

    OPTIMIZED: 300ms

    SELECT 
        g.*
    FROM
        table g
            JOIN
        (SELECT 
            id
        FROM
            table
        WHERE
            RAND() < (SELECT 
                    ((4 / COUNT(*)) * 10)
                FROM
                    table)
        ORDER BY RAND()
        LIMIT 4) AS z ON z.id= g.id
    

    note about limit ammount: limit 4 and 4/count(*). The 4s need to be the same number. Changing how many you return doesn't effect the speed that much. Benchmark at limit 4 and limit 1000 are the same. Limit 10,000 took it up to 600ms

    note about join: Randomizing just the id is faster than randomizing a whole row. Since it has to copy the entire row into memory then randomize it. The join can be any table that is linked to the subquery Its to prevent tablescans.

    note where clause: The where count limits down the ammount of results that are being randomized. It takes a percentage of the results and sorts them rather than the whole table.

    note sub query: The if doing joins and extra where clause conditions you need to put them both in the subquery and the subsubquery. To have an accurate count and pull back correct data.

    UNOPTIMIZED: 1200ms

    SELECT 
        g.*
    FROM
        table g
    ORDER BY RAND()
    LIMIT 4
    

    PROS

    4x faster than order by rand(). This solution can work with any table with a indexed column.

    CONS

    It is a bit complex with complex queries. Need to maintain 2 code bases in the subqueries

    0 讨论(0)
提交回复
热议问题