select 1 random row with complex filtering

前端 未结 4 1920
死守一世寂寞
死守一世寂寞 2021-01-24 05:11

I\'ve 2 tables:

first table users:

+-------------------------+---------+------+-----+---------+-------+
| Field                   | Type             


        
相关标签:
4条回答
  • 2021-01-24 05:54

    As RAND() function assigns a random number to every row present in result, performance will be directly proportional to number of records.

    If you want to select only one (random) record, you can apply LIMIT <random number from 0 to record count>, 1

    e.g.:

    SELECT u.id, count(u.id) as `count`
    FROM users u
    WHERE
        first_name IS NOT NULL
    AND
    NOT EXISTS (
        SELECT *
        FROM proposal
        WHERE
            proposal.to_id = u.id
    )
    LIMIT RAND(0, count-1) , 1
    

    I haven't tried executing this query, however, it MySQL complains about using count in RAND, you can calculate count separately and substitute the value in this query.

    0 讨论(0)
  • 2021-01-24 05:55

    First, I don't think the select distinct is necessary. So, try removing that:

    SELECT p.*
    FROM profiles p
    WHERE p.first_name IS NOT NULL AND
          NOT EXISTS (SELECT 1
                      FROM proposal pr
                      WHERE pr.to_id = p.id
                     )
    ORDER BY rand()
    LIMIT 0 , 1;
    

    That might help a bit. Then, a relatively easy way to reduce the time spent is to reduce the data volume. If you know you will always have thousands of rows that meet the conditions, then you can do:

    SELECT p.*
    FROM profiles
    WHERE p.first_name IS NOT NULL AND
          NOT EXISTS (SELECT 1
                      FROM proposal pr
                      WHERE pr.to_id = p.id
                     ) AND
          rand() < 0.01
    ORDER BY rand()
    LIMIT 0, 1;
    

    The trick is to find the comparison value that ensures that you get at least one row. This is tricky because you have another set of data. Here is one method that uses a subquery:

    SELECT p.*
    FROM (SELECT p.*, (@rn := @rn + 1) as rn
          FROM profiles p CROSS JOIN
               (SELECT @rn := 0) params
          WHERE p.first_name IS NOT NULL AND
                NOT EXISTS (SELECT 1
                            FROM proposal pr
                            WHERE pr.to_id = p.id
                           ) 
         ) p
    WHERE rand() < 100 / @rn
    ORDER BY rand()
    LIMIT 0, 1;
    

    This uses a variable to calculate the number of rows and then randomly selects 100 of them for processing. When choosing 100 rows randomly, there is a very, very, very high likelihood that at least one will be chosen.

    The downside to this approach is that the subquery needs to be materialized, which adds to the cost of the query. It is, however, cheaper than a sort on the full data.

    0 讨论(0)
  • 2021-01-24 05:57

    Have you tried switching not exists to left join?

    SELECT DISTINCT *
    FROM   profiles t1
    LEFT JOIN
           proposal t2
    ON     t1.id = t2.to_id
    WHERE  t1.first_name IS NOT NULL AND
           t2.to_id IS NULL
    ORDER BY RAND()
    LIMIT 0 , 1
    

    This will return you all rows of profiles, and to those that are not matched by a row in proposal it will assign NULL values, on which you can filter.

    The result should be the same, but the performance may be better.

    0 讨论(0)
  • 2021-01-24 06:05

    I've two problem solutions.

    1) With random id, from https://stackoverflow.com/a/4329447/2051938

    SELECT *
    FROM profiles AS r1
    JOIN
        (SELECT CEIL(RAND() *
                         (SELECT MAX(id)
                            FROM profiles)) AS id)
            AS r2
    WHERE
        r1.id >= r2.id
        AND
        r1.first_name IS NOT NULL
    AND
    NOT EXISTS (
        SELECT *
        FROM proposal
        WHERE
            proposal.to_id = r1.id
    )
    LIMIT 0 , 1
    

    2) With ORDER BY RAND()

    SELECT *
    FROM
        (
            SELECT *
            FROM profiles
            WHERE
                profiles.first_name IS NOT NULL
            ORDER BY RAND()
        ) AS users
    WHERE
        NOT EXISTS (
            SELECT *
            FROM proposal
            WHERE
                proposal.to_id = users.id
        )
    LIMIT 0 , 1
    

    First solution is faster but it've problem with "holes in id" and when you got id from the end (users may end earlier than there will be a match)

    Second solution is slower but without flaws!

    0 讨论(0)
提交回复
热议问题