Good day guys, I've been struggling with this for the past day and I just can't seem to figure it out.
My task is to derive the most sold product for each country from the popular open source database called NORTHWIND: https://northwinddatabase.codeplex.com
I was able to get to this stage, here is my code in SQL Server:
--Get most sold product for each country
WITH TotalProductsSold AS
(
SELECT od.ProductID, SUM(od.Quantity) AS TotalSold
FROM [Order Details] AS od
GROUP BY od.ProductID
)
SELECT MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity, s.Country --,p.ProductName
FROM Products AS p
INNER JOIN TotalProductsSold
ON TotalProductsSold.ProductID = p.ProductID
INNER JOIN Suppliers AS s
ON s.SupplierID = p.SupplierID
GROUP BY s.Country
ORDER BY MostSoldQuantity DESC
This gives me the following result:
That's all good but I wish to find out the product name for the MostSoldQuantity.
Thank you very much !
P.S I put a comment --p.ProductName where I thought it would work but it didnt and if someone could explain me why does GROUP BY not automatically allow me to derive the product name for the row that would be great
First, start with the count of products sold, per country, not just per product. Then rank them and pick only anything at RANK = 1. Something like...
WITH
ProductQuantityByCountry AS
(
SELECT
s.CountryID,
p.ProductID,
SUM(od.Quantity) AS Quantity
FROM
[Order Details] AS od
INNER JOIN
Products AS p
ON p.ProductID = od.ProductID
INNER JOIN
Suppliers AS s
ON s.SupplierID = p.SupplierID
GROUP BY
s.CountryID,
p.ProductID
),
RankedProductQuantityByCountry
AS
(
SELECT
RANK() OVER (PARTITION BY CountryID ORDER BY Quantity DESC) AS countryRank,
*
FROM
ProductQuantityByCountry
)
SELECT
*
FROM
RankedProductQuantityByCountry
WHERE
countryRank = 1
Note, one country may supply identical quantity of different producs, and so two products could both have rank = 1. Look into ROW_NUMER()
and/or DENSE_RANK()
for other but similar behaviours to RANK()
.
EDIT:
A simple though exercise to cover why SQL doesn't let you put Product.Name
in your final query is to ask a question.
What should SQL do in this case?
SELECT
MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity,
MIN(TotalProductsSold.TotalSold) AS LeastSoldQuantity,
s.Country,
p.ProductName
FROM
blahblahblah
GROUP BY
s.Country
ORDER BY
MostSoldQuantity DESC
The presence of a MIN
and a MAX
makes things ambiguous.
You may be clear that you want to perform an operation by country
and that operation to be to pick the product with the highest sales volume from that country. But it's not actually explicit, and small changes to the query could have very confusing consequences to any inferred behaviour. Instead SQL's declarative syntax provides a very clear / explicit / deterministic description of the problem to be solved.
If an expression isn't mentioned in the GROUP BY
clause, you can't SELECT
it, without aggregating it. This is so that there is no ambiguity as to what is meant or what the SQL engine is supposed to do.
By requiring you to stipulate get the total sales per country per product
at one level of the query, you can then cleanly state and then pick the highest ranked per country
at another level of the query.
This can feel like you end up with queries that are longer than "should" be necessary. But it also results in queries that are completely un-ambiguous, both for compiling the query down to an execution plan, and for other coders who will read your code in the future.
来源:https://stackoverflow.com/questions/47512155/get-most-sold-product-for-each-country-from-northwind-database