Problems with ORDER BY RAND() and big tables

泄露秘密 提交于 2021-01-29 11:15:04

问题


Hello I asked a question this morning, and I realized that the problem was not where I was looking (here the original question)

I have this query to randomly generate registries from an address book.

SELECT * FROM address_book ab 
            WHERE 
            ab.source = "PB" AND 
            ab.city_id = :city_id AND 
            pb_campaign_id = :pb_campaign_id AND 
            ab.id NOT IN (SELECT address_book_id FROM calls WHERE calls.address_book_id = ab.id AND calls.status_id IN ("C","NO") OR (calls.status_id IN ("NR","OC") AND TIMESTAMPDIFF(MINUTE,calls.updated_at,NOW()) < 30))
            ORDER BY RAND()
            LIMIT 1';

but I noticed that "order by rand ()" take more than 50s and use up to 25-50% CPU with large tables (100k +) so i looked for solutions here but i didn't find anything that worked. note: ids are not self-incrementing, there may be gaps

Any idea?


回答1:


I would recommend writing this as:

SELECT *
FROM address_book ab 
WHERE ab.source = 'PB' AND 
      ab.city_id = :city_id AND 
      pb_campaign_id = :pb_campaign_id AND 
      NOT EXISTS (SELECT 1
                  FROM calls c
                  WHERE c.address_book_id = ab.id AND
                        ( c.status_id IN ('C', 'NO') OR
                         (c.status_id IN ('NR', 'OC') AND c.updated < now() - interval 30 minute)
                        ) 
                )

ORDER BY RAND()
LIMIT 1;

Note that this changes the logic in the correlated subquery so c.address_book_id = ab.id always applies. I suspect that is the issue with performance.

Then, create indexes on:

  • address_book(source, city_id, campaign_id, id)
  • calls(address_book_id, status_id, updated)

I am guessing that this will be sufficient to improve performance. If there happen to be a zillion rows that match the conditions, then the order by rand() might be an issue.




回答2:


  1. I will never suggest for sub query in huge DB its take long execution time.
  2. Use proper indexing and if its require use inner join(never use left join)
  3. if possible use your business logic in php script because maybe your db will more large and take too much time for execute such query.
  4. if you want only one data in large db don't use rand() function, take any rand number (1 to db rows count) and use limit limit skip,number ex. limit 2,1 its give row 3 only Hope its useful.


来源:https://stackoverflow.com/questions/64266163/problems-with-order-by-rand-and-big-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!