问题
this is kind of building off a previous question I asked. Suppose I have a denormalized table that looks something like this:
Apple_ID | Tree_ID | Orchard_ID | Tree_Height | ...other columns...
---------------------------------------------------------------------
1 | 1 | 1 | 12 | ...other values...
---------------------------------------------------------------------
2 | 1 | 1 | 12 | ...other values...
---------------------------------------------------------------------
3 | 1 | 1 | 12 | ...other values...
---------------------------------------------------------------------
4 | 2 | 1 | 15 | ...other values...
---------------------------------------------------------------------
5 | 2 | 1 | 15 | ...other values...
---------------------------------------------------------------------
6 | 2 | 1 | 15 | ...other values...
---------------------------------------------------------------------
7 | 2 | 1 | 15 | ...other values...
---------------------------------------------------------------------
8 | 3 | 1 | 20 | ...other values...
---------------------------------------------------------------------
9 | 3 | 1 | 20 | ...other values...
---------------------------------------------------------------------
10 | 4 | 2 | 30 | ...other values...
---------------------------------------------------------------------
11 | 5 | 2 | 10 | ...other values...
---------------------------------------------------------------------
12 | 5 | 2 | 10 | ...other values...
---------------------------------------------------------------------
13 | 5 | 2 | 10 | ...other values...
---------------------------------------------------------------------
I want to calculate the sum of Tree_Heights in each orchard, so the result I want to get back is:
Orchard_ID | sum(Tree_Height)
-------------------------------
1 | 47
-------------------------------
2 | 40
However, due to the denormalization, the sum inflates to this:
Orchard_ID | sum(Tree_Height)
-------------------------------
1 | 136
-------------------------------
2 | 60
The solution from the question I mentioned before cannot be implemented here, since we cannot sum() by a unique row column. How can I write a simple query to get the intended result?
回答1:
The easiest way to write this is with a CTE
, but if your system doesn't support that you can use a derived table. We use ROW_NUMBER()
over the Apple_ID
for each tree to give us a unique row to sum:
SELECT "Orchard_ID", SUM("Tree_Height") AS Total_Height
FROM (
SELECT "Orchard_ID", "Tree_Height",
ROW_NUMBER() OVER (PARTITION BY "Orchard_ID", "Tree_ID" ORDER BY "Apple_ID") AS rn
FROM data
) d
WHERE rn = 1
GROUP BY "Orchard_ID"
Output
Orchard_ID total_height
1 47
2 40
If you could use CTE
s this is how it would be written:
WITH CTE AS (
SELECT "Orchard_ID", "Tree_Height",
ROW_NUMBER() OVER (PARTITION BY "Orchard_ID", "Tree_ID" ORDER BY "Apple_ID") AS rn
FROM data
)
SELECT "Orchard_ID", SUM("Tree_Height") AS Total_Height
FROM CTE
WHERE rn = 1
GROUP BY "Orchard_ID"
Demo on SQLFiddle
来源:https://stackoverflow.com/questions/59508081/how-to-calculate-a-non-inflated-sum-from-a-denormalized-table