Top 'n' results for each keyword

后端未结

关注

 2  1674

I have a query to get the top \'n\' users who commented on a specific keyword,

SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = \"ec


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2021-01-14 22:13
              
            
            
                                                                       
You can use a pattern like this (from Within-group quotas (Top N per group)):

SELECT tmp.ID, tmp.entrydate 
FROM ( 
  SELECT 
    ID, entrydate, 
    IF( @prev <> ID, @rownum := 1, @rownum := @rownum+1 ) AS rank, 
    @prev := ID 
  FROM test t 
  JOIN (SELECT @rownum := NULL, @prev := 0) AS r 
  ORDER BY t.ID 
) AS tmp 
WHERE tmp.rank <= 2 
ORDER BY ID, entrydate; 
+------+------------+ 
| ID   | entrydate  | 
+------+------------+ 
|    1 | 2007-05-01 | 
|    1 | 2007-05-02 | 
|    2 | 2007-06-03 | 
|    2 | 2007-06-04 | 
|    3 | 2007-07-01 | 
|    3 | 2007-07-02 | 
+------+------------+ 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2021-01-14 22:28
              
            
            
                                                                       
Since you haven't given the schema for results, I'll assume it's this or very similar (maybe extra columns):

create table results (
  id int primary key,
  user int,
    foreign key (user) references <some_other_table>(id),
  keyword varchar(<30>)
);


Step 1: aggregate by keyword/user as in your example query, but for all keywords:

create view user_keyword as (
  select
    keyword,
    user,
    count(*) as magnitude
  from results
  group by keyword, user
);


Step 2: rank each user within each keyword group (note the use of the subquery to rank the rows):

create view keyword_user_ranked as (
  select 
    keyword,
    user,
    magnitude,
    (select count(*) 
     from user_keyword 
     where l.keyword = keyword and magnitude >= l.magnitude
    ) as rank
  from
    user_keyword l
);


Step 3: select only the rows where the rank is less than some number:

select * 
from keyword_user_ranked 
where rank <= 3;




Example:

Base data used:

mysql> select * from results;
+----+------+---------+
| id | user | keyword |
+----+------+---------+
|  1 |    1 | mysql   |
|  2 |    1 | mysql   |
|  3 |    2 | mysql   |
|  4 |    1 | query   |
|  5 |    2 | query   |
|  6 |    2 | query   |
|  7 |    2 | query   |
|  8 |    1 | table   |
|  9 |    2 | table   |
| 10 |    1 | table   |
| 11 |    3 | table   |
| 12 |    3 | mysql   |
| 13 |    3 | query   |
| 14 |    2 | mysql   |
| 15 |    1 | mysql   |
| 16 |    1 | mysql   |
| 17 |    3 | query   |
| 18 |    4 | mysql   |
| 19 |    4 | mysql   |
| 20 |    5 | mysql   |
+----+------+---------+


Grouped by keyword and user:

mysql> select * from user_keyword order by keyword, magnitude desc;
+---------+------+-----------+
| keyword | user | magnitude |
+---------+------+-----------+
| mysql   |    1 |         4 |
| mysql   |    2 |         2 |
| mysql   |    4 |         2 |
| mysql   |    3 |         1 |
| mysql   |    5 |         1 |
| query   |    2 |         3 |
| query   |    3 |         2 |
| query   |    1 |         1 |
| table   |    1 |         2 |
| table   |    2 |         1 |
| table   |    3 |         1 |
+---------+------+-----------+


Users ranked within keywords:

mysql> select * from keyword_user_ranked order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql   |    1 |         4 |    1 |
| mysql   |    2 |         2 |    3 |
| mysql   |    4 |         2 |    3 |
| mysql   |    3 |         1 |    5 |
| mysql   |    5 |         1 |    5 |
| query   |    2 |         3 |    1 |
| query   |    3 |         2 |    2 |
| query   |    1 |         1 |    3 |
| table   |    1 |         2 |    1 |
| table   |    3 |         1 |    3 |
| table   |    2 |         1 |    3 |
+---------+------+-----------+------+


Only top 2 from each keyword:

mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql   |    1 |         4 |    1 |
| query   |    2 |         3 |    1 |
| query   |    3 |         2 |    2 |
| table   |    1 |         2 |    1 |
+---------+------+-----------+------+




Note that when there are ties -- see users 2 and 4 for keyword "mysql" in the examples -- all parties in the tie get the "last" rank, i.e. if the 2nd and 3rd are tied, both are assigned rank 3.



Performance: adding an index to the keyword and user columns will help.  I have a table being queried in a similar way with 4000 and 1300 distinct values for the two columns (in a 600000-row table).  You can add the index like this:

alter table results add index keyword_user (keyword, user);


In my case, query time dropped from about 6 seconds to about 2 seconds.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复