Top 'n' results for each keyword

后端 未结 2 1674
-上瘾入骨i
-上瘾入骨i 2021-01-14 21:43

I have a query to get the top \'n\' users who commented on a specific keyword,

SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = \"ec         


        
相关标签:
2条回答
  • 2021-01-14 22:13

    You can use a pattern like this (from Within-group quotas (Top N per group)):

    SELECT tmp.ID, tmp.entrydate 
    FROM ( 
      SELECT 
        ID, entrydate, 
        IF( @prev <> ID, @rownum := 1, @rownum := @rownum+1 ) AS rank, 
        @prev := ID 
      FROM test t 
      JOIN (SELECT @rownum := NULL, @prev := 0) AS r 
      ORDER BY t.ID 
    ) AS tmp 
    WHERE tmp.rank <= 2 
    ORDER BY ID, entrydate; 
    +------+------------+ 
    | ID   | entrydate  | 
    +------+------------+ 
    |    1 | 2007-05-01 | 
    |    1 | 2007-05-02 | 
    |    2 | 2007-06-03 | 
    |    2 | 2007-06-04 | 
    |    3 | 2007-07-01 | 
    |    3 | 2007-07-02 | 
    +------+------------+ 
    
    0 讨论(0)
  • 2021-01-14 22:28

    Since you haven't given the schema for results, I'll assume it's this or very similar (maybe extra columns):

    create table results (
      id int primary key,
      user int,
        foreign key (user) references <some_other_table>(id),
      keyword varchar(<30>)
    );
    

    Step 1: aggregate by keyword/user as in your example query, but for all keywords:

    create view user_keyword as (
      select
        keyword,
        user,
        count(*) as magnitude
      from results
      group by keyword, user
    );
    

    Step 2: rank each user within each keyword group (note the use of the subquery to rank the rows):

    create view keyword_user_ranked as (
      select 
        keyword,
        user,
        magnitude,
        (select count(*) 
         from user_keyword 
         where l.keyword = keyword and magnitude >= l.magnitude
        ) as rank
      from
        user_keyword l
    );
    

    Step 3: select only the rows where the rank is less than some number:

    select * 
    from keyword_user_ranked 
    where rank <= 3;
    

    Example:

    Base data used:

    mysql> select * from results;
    +----+------+---------+
    | id | user | keyword |
    +----+------+---------+
    |  1 |    1 | mysql   |
    |  2 |    1 | mysql   |
    |  3 |    2 | mysql   |
    |  4 |    1 | query   |
    |  5 |    2 | query   |
    |  6 |    2 | query   |
    |  7 |    2 | query   |
    |  8 |    1 | table   |
    |  9 |    2 | table   |
    | 10 |    1 | table   |
    | 11 |    3 | table   |
    | 12 |    3 | mysql   |
    | 13 |    3 | query   |
    | 14 |    2 | mysql   |
    | 15 |    1 | mysql   |
    | 16 |    1 | mysql   |
    | 17 |    3 | query   |
    | 18 |    4 | mysql   |
    | 19 |    4 | mysql   |
    | 20 |    5 | mysql   |
    +----+------+---------+
    

    Grouped by keyword and user:

    mysql> select * from user_keyword order by keyword, magnitude desc;
    +---------+------+-----------+
    | keyword | user | magnitude |
    +---------+------+-----------+
    | mysql   |    1 |         4 |
    | mysql   |    2 |         2 |
    | mysql   |    4 |         2 |
    | mysql   |    3 |         1 |
    | mysql   |    5 |         1 |
    | query   |    2 |         3 |
    | query   |    3 |         2 |
    | query   |    1 |         1 |
    | table   |    1 |         2 |
    | table   |    2 |         1 |
    | table   |    3 |         1 |
    +---------+------+-----------+
    

    Users ranked within keywords:

    mysql> select * from keyword_user_ranked order by keyword, rank asc;
    +---------+------+-----------+------+
    | keyword | user | magnitude | rank |
    +---------+------+-----------+------+
    | mysql   |    1 |         4 |    1 |
    | mysql   |    2 |         2 |    3 |
    | mysql   |    4 |         2 |    3 |
    | mysql   |    3 |         1 |    5 |
    | mysql   |    5 |         1 |    5 |
    | query   |    2 |         3 |    1 |
    | query   |    3 |         2 |    2 |
    | query   |    1 |         1 |    3 |
    | table   |    1 |         2 |    1 |
    | table   |    3 |         1 |    3 |
    | table   |    2 |         1 |    3 |
    +---------+------+-----------+------+
    

    Only top 2 from each keyword:

    mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
    +---------+------+-----------+------+
    | keyword | user | magnitude | rank |
    +---------+------+-----------+------+
    | mysql   |    1 |         4 |    1 |
    | query   |    2 |         3 |    1 |
    | query   |    3 |         2 |    2 |
    | table   |    1 |         2 |    1 |
    +---------+------+-----------+------+
    

    Note that when there are ties -- see users 2 and 4 for keyword "mysql" in the examples -- all parties in the tie get the "last" rank, i.e. if the 2nd and 3rd are tied, both are assigned rank 3.


    Performance: adding an index to the keyword and user columns will help. I have a table being queried in a similar way with 4000 and 1300 distinct values for the two columns (in a 600000-row table). You can add the index like this:

    alter table results add index keyword_user (keyword, user);
    

    In my case, query time dropped from about 6 seconds to about 2 seconds.

    0 讨论(0)
提交回复
热议问题