Get most sold product for each country from NORTHWIND database

江枫思渺然 提交于 2019-12-08 02:56:12

问题


Good day guys, I've been struggling with this for the past day and I just can't seem to figure it out.

My task is to derive the most sold product for each country from the popular open source database called NORTHWIND: https://northwinddatabase.codeplex.com

I was able to get to this stage, here is my code in SQL Server:

--Get most sold product for each country
WITH TotalProductsSold AS 
(
    SELECT od.ProductID, SUM(od.Quantity) AS TotalSold
        FROM [Order Details] AS od
        GROUP BY od.ProductID
)
SELECT MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity, s.Country --,p.ProductName
    FROM Products AS p
    INNER JOIN TotalProductsSold
    ON TotalProductsSold.ProductID = p.ProductID
    INNER JOIN Suppliers AS s
    ON s.SupplierID = p.SupplierID
    GROUP BY s.Country
    ORDER BY MostSoldQuantity DESC

This gives me the following result:

That's all good but I wish to find out the product name for the MostSoldQuantity.

Thank you very much !

P.S I put a comment --p.ProductName where I thought it would work but it didnt and if someone could explain me why does GROUP BY not automatically allow me to derive the product name for the row that would be great


回答1:


First, start with the count of products sold, per country, not just per product. Then rank them and pick only anything at RANK = 1. Something like...

WITH
    ProductQuantityByCountry AS 
(
    SELECT
       s.CountryID,
       p.ProductID,
       SUM(od.Quantity)   AS Quantity
    FROM
        [Order Details]   AS od
    INNER JOIN
        Products          AS p
            ON  p.ProductID = od.ProductID
    INNER JOIN
        Suppliers         AS s
            ON  s.SupplierID = p.SupplierID
    GROUP BY
       s.CountryID,
       p.ProductID
),
    RankedProductQuantityByCountry
AS
(
    SELECT
        RANK() OVER (PARTITION BY CountryID ORDER BY Quantity DESC)  AS countryRank,
        *
    FROM
        ProductQuantityByCountry
)
SELECT
    *
FROM
    RankedProductQuantityByCountry
WHERE
    countryRank = 1

Note, one country may supply identical quantity of different producs, and so two products could both have rank = 1. Look into ROW_NUMER() and/or DENSE_RANK() for other but similar behaviours to RANK().

EDIT: A simple though exercise to cover why SQL doesn't let you put Product.Name in your final query is to ask a question.

What should SQL do in this case?

SELECT
    MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity,
    MIN(TotalProductsSold.TotalSold) AS LeastSoldQuantity,
    s.Country,
    p.ProductName
FROM
    blahblahblah
GROUP BY
    s.Country
ORDER BY
    MostSoldQuantity DESC

The presence of a MIN and a MAX makes things ambiguous.

You may be clear that you want to perform an operation by country and that operation to be to pick the product with the highest sales volume from that country. But it's not actually explicit, and small changes to the query could have very confusing consequences to any inferred behaviour. Instead SQL's declarative syntax provides a very clear / explicit / deterministic description of the problem to be solved.

If an expression isn't mentioned in the GROUP BY clause, you can't SELECT it, without aggregating it. This is so that there is no ambiguity as to what is meant or what the SQL engine is supposed to do.

By requiring you to stipulate get the total sales per country per product at one level of the query, you can then cleanly state and then pick the highest ranked per country at another level of the query.

This can feel like you end up with queries that are longer than "should" be necessary. But it also results in queries that are completely un-ambiguous, both for compiling the query down to an execution plan, and for other coders who will read your code in the future.



来源:https://stackoverflow.com/questions/47512155/get-most-sold-product-for-each-country-from-northwind-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!