Using group by and having clause

前端 未结 5 1192
一生所求
一生所求 2020-12-07 11:59

Using the following schema:

Supplier (sid, name, status, city)
Part (pid, name, color, weight, city)
Project (jid, name, city)
Supplies (sid, pid, jid**, qua         


        
相关标签:
5条回答
  • 2020-12-07 12:06

    The semantics of Having

    To better understand having, you need to see it from a theoretical point of view.

    A group by is a query that takes a table and summarizes it into another table. You summarize the original table by grouping the original table into subsets (based upon the attributes that you specify in the group by). Each of these groups will yield one tuple.

    The Having is simply equivalent to a WHERE clause after the group by has executed and before the select part of the query is computed.

    Lets say your query is:

    select a, b, count(*) 
    from Table 
    where c > 100 
    group by a, b 
    having count(*) > 10;
    

    The evaluation of this query can be seen as the following steps:

    1. Perform the WHERE, eliminating rows that do not satisfy it.
    2. Group the table into subsets based upon the values of a and b (each tuple in each subset has the same values of a and b).
    3. Eliminate subsets that do not satisfy the HAVING condition
    4. Process each subset outputting the values as indicated in the SELECT part of the query. This creates one output tuple per subset left after step 3.

    You can extend this to any complex query there Table can be any complex query that return a table (a cross product, a join, a UNION, etc).

    In fact, having is syntactic sugar and does not extend the power of SQL. Any given query:

    SELECT list 
    FROM table
    GROUP BY attrList
    HAVING condition;
    

    can be rewritten as:

    SELECT list from (
       SELECT listatt 
       FROM table 
       GROUP BY attrList) as Name
    WHERE condition;
    

    The listatt is a list that includes the GROUP BY attributes and the expressions used in list and condition. It might be necessary to name some expressions in this list (with AS). For instance, the example query above can be rewritten as:

    select a, b, count 
    from (select a, b, count(*) as count
          from Table 
          where c > 100
          group by a, b) as someName
    where count > 10;
    

    The solution you need

    Your solution seems to be correct:

    SELECT s.sid, s.name
    FROM Supplier s, Supplies su, Project pr
    WHERE s.sid = su.sid AND su.jid = pr.jid
    GROUP BY s.sid, s.name
    HAVING COUNT (DISTINCT pr.jid) >= 2 
    

    You join the three tables, then using sid as a grouping attribute (sname is functionally dependent on it, so it does not have an impact on the number of groups, but you must include it, otherwise it cannot be part of the select part of the statement). Then you are removing those that do not satisfy your condition: the satisfy pr.jid is >= 2, which is that you wanted originally.

    Best solution to your problem

    I personally prefer a simpler cleaner solution:

    1. You need to only group by Supplies (sid, pid, jid**, quantity) to find the sid of those that supply at least to two projects.
    2. Then join it to the Suppliers table to get the supplier same.

     SELECT sid, sname from
        (SELECT sid from supplies 
        GROUP BY sid, pid 
        HAVING count(DISTINCT jid) >= 2
        ) AS T1
    NATURAL JOIN 
    Supliers;
    

    It will also be faster to execute, because the join is only done when needed, not all the times.

    --dmg

    0 讨论(0)
  • 2020-12-07 12:10

    Because we can not use Where clause with aggregate functions like count(),min(), sum() etc. so having clause came into existence to overcome this problem in sql. see example for having clause go through this link

    http://www.sqlfundamental.com/having-clause.php

    0 讨论(0)
  • 2020-12-07 12:11

    First of all, you should use the JOIN syntax rather than FROM table1, table2, and you should always limit the grouping to as little fields as you need.

    Altought I haven't tested, your first query seems fine to me, but could be re-written as:

    SELECT s.sid, s.name
    FROM 
        Supplier s
        INNER JOIN (
           SELECT su.sid
           FROM Supplies su
           GROUP BY su.sid
           HAVING COUNT(DISTINCT su.jid) > 1
        ) g
            ON g.sid = s.sid
    

    Or simplified as:

    SELECT sid, name
    FROM Supplier s
    WHERE (
        SELECT COUNT(DISTINCT su.jid)
        FROM Supplies su
        WHERE su.sid = s.sid
    ) > 1
    

    However, your second query seems wrong to me, because you should also GROUP BY pid.

     SELECT s.sid, s.name
        FROM 
            Supplier s
            INNER JOIN (
                SELECT su.sid
                FROM Supplies su
                GROUP BY su.sid, su.pid
                HAVING COUNT(DISTINCT su.jid) > 1
            ) g
                ON g.sid = s.sid
    

    As you may have noticed in the query above, I used the INNER JOIN syntax to perform the filtering, however it can be also written as:

    SELECT s.sid, s.name
    FROM Supplier s
    WHERE (
         SELECT COUNT(DISTINCT su.jid)
         FROM Supplies su
         WHERE su.sid = s.sid
         GROUP BY su.sid, su.pid
    ) > 1
    
    0 讨论(0)
  • 2020-12-07 12:20

    1.Get supplier numbers and names for suppliers of parts supplied to at least two different projects.

     SELECT S.SID, S.NAME
     FROM SUPPLIES SP
     JOIN SUPPLIER S
     ON SP.SID = S.SID
     WHERE PID IN
     (SELECT PID FROM SUPPPLIES GROUP BY PID, JID HAVING COUNT(*) >= 2)
    

    I am not slear about your second question

    0 讨论(0)
  • 2020-12-07 12:27

    What type of sql database are using (MSSQL, Oracle etc)? I believe what you have written is correct.

    You could also write the first query like this:

    SELECT s.sid, s.name
    FROM Supplier s
    WHERE (SELECT COUNT(DISTINCT pr.jid)
           FROM Supplies su, Projects pr
           WHERE su.sid = s.sid 
               AND pr.jid = su.jid) >= 2
    

    It's a little more readable, and less mind-bending than trying to do it with GROUP BY. Performance may differ though.

    0 讨论(0)
提交回复
热议问题