select 1 random row with complex filtering

前端未结

关注

 4  1933

死守一世寂寞 2021-01-24 05:11

I\'ve 2 tables:

first table users:

+-------------------------+---------+------+-----+---------+-------+
| Field                   | Type


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   梦毁少年i
                                             
                
                
                (楼主)
            
              
              
                2021-01-24 05:55
              

            
            
                        
First, I don't think the select distinct is necessary.  So, try removing that:

SELECT p.*
FROM profiles p
WHERE p.first_name IS NOT NULL AND
      NOT EXISTS (SELECT 1
                  FROM proposal pr
                  WHERE pr.to_id = p.id
                 )
ORDER BY rand()
LIMIT 0 , 1;


That might help a bit.  Then, a relatively easy way to reduce the time spent is to reduce the data volume.  If you know you will always have thousands of rows that meet the conditions, then you can do:

SELECT p.*
FROM profiles
WHERE p.first_name IS NOT NULL AND
      NOT EXISTS (SELECT 1
                  FROM proposal pr
                  WHERE pr.to_id = p.id
                 ) AND
      rand() < 0.01
ORDER BY rand()
LIMIT 0, 1;


The trick is to find the comparison value that ensures that you get at least one row.  This is tricky because you have another set of data.  Here is one method that uses a subquery:

SELECT p.*
FROM (SELECT p.*, (@rn := @rn + 1) as rn
      FROM profiles p CROSS JOIN
           (SELECT @rn := 0) params
      WHERE p.first_name IS NOT NULL AND
            NOT EXISTS (SELECT 1
                        FROM proposal pr
                        WHERE pr.to_id = p.id
                       ) 
     ) p
WHERE rand() < 100 / @rn
ORDER BY rand()
LIMIT 0, 1;


This uses a variable to calculate the number of rows and then randomly selects 100 of them for processing.  When choosing 100 rows randomly, there is a very, very, very high likelihood that at least one will be chosen.

The downside to this approach is that the subquery needs to be materialized, which adds to the cost of the query.  It is, however, cheaper than a sort on the full data.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复