SQL To Find Most Popular Category

前端 未结 6 691
面向向阳花
面向向阳花 2021-01-20 00:04

I have 3 tables in my DB (MySQL).

categories (name:string)
items (name:string, category_id:int)
votes (value:int, item_id:int, created_at:datetime)


        
相关标签:
6条回答
  • 2021-01-20 00:51

    1) I assume the categories table also has category_id field, and the items table has an item_id or you won't be able to link the tables. 2) AS keyword is often optional

    SELECT c.name, SUM(ABS(v.value))
    FROM categories c
    INNER JOIN items i ON c.category_id = i.category_id
    INNER JOIN votes v ON v.item_id = i.item_id
    WHERE v.created_at BETWEEN DATE_SUB(NOW(), INTERVAL -7  DAYS) AND NOW()
    GROUP BY c.name
    ORDER BY 2 DESC
    LIMIT 10
    
    • The AS keyword is for making a synonym of the table name. Most times this is just a shorthand, but if you perform self-joins where the table is joined to itself, you need to distinguish them. And if two tables have the same field name, you need to specify which table's field you are using, thus c.category_id comes from the "c" table, which means the category table.
    • JOINS are essential. Start reading.
    • In my solution, I used DATE_SUB which is native to mySQL. I do not know how many other databases use that function, but all of them have something similar.
    • My query gives you the top ten categories with the most popular first. Note that the LIMIT N clause is how you do it in mySQL. In SQLServer, use TOP N after the SELECT keyword. In Oracle, it is done another way.
    • I took the absolute value of votes because you said include up or down votes.
    • ORDER BY 2 DESC means sort descending by the second column in the select. You can use the actual expression here, but that is more typing.
    • GROUP BY is necessary. Every column that is not a constant or aggregated with SUM, COUNT, MAX, etc. must appear in the GROUP BY clause if there are any aggregate functions used.
    0 讨论(0)
  • 2021-01-20 00:52
    SELECT c.name, sum(v.value) as cnt
     FROM categories c
     JOIN items i ON i.category_id = c.id
     JOIN votes v ON v.item_id = i.id
     WHERE v.created_at > #{1.week.ago}
     GROUP BY c.name 
     ORDER BY cnt DESC LIMIT 5;
    

    Edit: good point andrew, I fixed the query

    0 讨论(0)
  • 2021-01-20 00:54

    try this. use group by with the name of the category. i have commented out the created at clause as you specified, you can uncomment it if you want to use it.

     SELECT c.name, SUM(ABS(v.item_id)) 
     FROM categories c,items i, votes v
     WHERE c.name = i.name
        AND i.item_id=v.item_id
        --AND v.created_at > #{1.week.ago}
     GROUP BY c.name 
     ORDER BY SUM(ABS(v.item_id)) DESC LIMIT 5;
    

    you will notice that i did not use the JOIN keyword but instead filtered the results of the query using only WHERE clauses, which might be easier to understand. if you want to learn more about JOINs, here is a tutorial.

    Here, too, is a tutorial on SQL aliases (the AS clause). in fact, there are a bunch more tutorials on this site for a bunch of different SQL topics that are not platform dependent.

    edit: fixed as per comments, added the abs function,

    0 讨论(0)
  • 2021-01-20 00:54

    This is in SQL Server...but easy to convert for a MySQL guy I would think!

    select top 5
        c.name as CategoryName, sum(v.value) as VoteSum
    from categories c inner join items i
        on c.category_id = i.category_id
        inner join votes v
            on i.item_id = v.item_id
    where created_at between dateadd(week,-1,getdate()) and getdate()
    group by c.name
    order by sum(v.value) desc
    

    select top 5 --number of categories to show in calculation

    c.name as CategoryName, sum(v.value) as VoteSum --get the categoryname and the sum of votes

    from categories c inner join items i --join the heirarchy: categories to items

    on c.category_id = i.category_id --on category_id

    inner join votes v --items to votes

    on i.item_id = v.item_id --on item_id

    where created_at between dateadd(week,-1,getdate()) and getdate() --specify the date range to include

    group by c.name --group the results by the category

    order by sum(v.value) desc --order the results by the summed value

    Here is some sql (from SQL Server) to get the tables up and running:

    CREATE TABLE [categories](
        [category_id] [int] IDENTITY(1,1) NOT NULL,
        [name] [varchar](50) NOT NULL
    )
    
    CREATE TABLE [items](
    [item_id] [int] IDENTITY(1,1) NOT NULL,
    [category_id] [int] NOT NULL,
    [name] [varchar](50)
    )
    
    CREATE TABLE [dbo].[votes](
    [vote_id] [int] IDENTITY(1,1) NOT NULL,
    [value] [int] NOT NULL,
    [item_id] [int] NOT NULL,
    [created_at] [datetime] NOT NULL
    )
    
    insert into categories (name) values (' asp.net ')
    insert into categories (name) values (' c#  ')
    insert into categories (name) values (' vb  ')
    insert into categories (name) values (' sql ')
    insert into categories (name) values (' html    ')
    insert into categories (name) values (' javascript  ')
    
    insert into items (category_id, name) values (  1   ,'  session handling    ')
    insert into items (category_id, name) values (  1   ,'  mvc vs mvp  ')
    insert into items (category_id, name) values (  1   ,'  code behind or no code behind   ')
    insert into items (category_id, name) values (  2   ,'  LINQ?   ')
    insert into items (category_id, name) values (  2   ,'  lamdas  ')
    insert into items (category_id, name) values (  2   ,'  multi-threaded code ')
    insert into items (category_id, name) values (  2   ,'  SOLID principles    ')
    insert into items (category_id, name) values (  3   ,'  vb vs C#    ')
    insert into items (category_id, name) values (  3   ,'  VB.NET over vb6 ')
    insert into items (category_id, name) values (  4   ,'  CLR procedures or stored procedures ')
    insert into items (category_id, name) values (  4   ,'  ORMs vs stored procedures and views ')
    insert into items (category_id, name) values (  6   ,'  jquery instead of standard DOM  ')
    
    insert into votes (value, item_id, created_at) values ( -1  ,   1   ,'  6/26/2009 18:59 ')
    insert into votes (value, item_id, created_at) values ( 1   ,   1   ,'  6/26/2009 18:59 ')
    insert into votes (value, item_id, created_at) values ( 3   ,   1   ,'  6/26/2009 18:59 ')
    insert into votes (value, item_id, created_at) values ( 3   ,   1   ,'  6/26/2009 18:59 ')
    insert into votes (value, item_id, created_at) values ( 2   ,   2   ,'  6/26/2009 18:59 ')
    insert into votes (value, item_id, created_at) values ( 2   ,   2   ,'  6/26/2009 18:59 ')
    insert into votes (value, item_id, created_at) values ( 4   ,   2   ,'  6/26/2009 18:59 ')
    insert into votes (value, item_id, created_at) values ( -3  ,   2   ,'  6/26/2009 18:59 ')
    insert into votes (value, item_id, created_at) values ( 2   ,   4   ,'  6/26/2009 19:00 ')
    insert into votes (value, item_id, created_at) values ( 6   ,   4   ,'  6/26/2009 19:00 ')
    insert into votes (value, item_id, created_at) values ( 3   ,   4   ,'  6/26/2009 19:00 ')
    insert into votes (value, item_id, created_at) values ( 5   ,   4   ,'  6/26/2009 19:00 ')
    insert into votes (value, item_id, created_at) values ( 8   ,   7   ,'  6/26/2009 19:00 ')
    insert into votes (value, item_id, created_at) values ( 3   ,   6   ,'  6/26/2009 19:00 ')
    insert into votes (value, item_id, created_at) values ( 8   ,   7   ,'  6/26/2009 19:01 ')
    insert into votes (value, item_id, created_at) values ( 2   ,   5   ,'  6/26/2009 19:01 ')
    
    0 讨论(0)
  • 2021-01-20 00:57

    You could probably use group by in this case and remove the join. I always screw up when using group by but Something like

    SELECT COUNT(votes.item_ID) AS score, 
      (SELECT ItemTitle FROM items WHERE items.item_id = votes.item_id) as Title
    FROM votes
    WHERE votes.created_at > #{1.week.ago}
    Group By Title
    Order By score
    Limit 5
    

    AS

    "as" allows you to give something a name.

    Notice above the as score, this gives the result from count(votes.item_id) the column name of score since it did not have a column name before. You could also use this if you want to call something by another name in the rest of the query.

    If you took off the as score it would come back as a column with no title and no way to access it by name, only by number.

    JOIN

    A join will merge 2 tables as 1 temp table and return this table. There are inner, outer, left, right and cross joins. Each with its own advantages but all have the same problem of being slow. Look into sub queries to replace most joins.

    You will also want to avoid using Select *, list out all the things you need.

    The best way to figure these out is to just run them all and see what they return and read what they are supposed to do w3Schools Joins

    0 讨论(0)
  • 2021-01-20 00:59

    This is what I ended up using (slightly modified version of @akf's answer). The other answers were great also, but this one seemed the simplest to me as a novice. I find it interesting that it doesn't use any joins? Would not have thought this was possible. I like how clean and simple it is.

    SELECT c.*, SUM(ABS(v.vote)) AS score
    FROM categories c,items i, votes v
      WHERE c.id = i.category_id
      AND i.id = v.item_id
      AND v.created_at > '#{1.week.ago}'
    GROUP BY c.id
    ORDER BY score DESC LIMIT 5;
    

    (I forgot to mention in the question that each table has an 'id' column, which you can see being used here.)

    Summing the absolute value of the vote value works in this case since there are positives and negatives as some posters correctly pointed out, but I was thinking it would be even simpler to just count the number of vote records in the database. I tried "COUNT(v) AS score" but this didn't work. If anyone knows how please post a comment. Thanks everyone!

    0 讨论(0)
提交回复
热议问题