I have a table like below:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 10
R1 C1 M1 B1 2017 20
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 5
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M1 B1 2017 35
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
I need to find SUM([Spend]
over different groups as follow:
- Total Spend over all the rows in the whole table
- Total Spend for each Region
- Total Spend for each Region and Country group
- Total Spend for each Region, Country and Advertiser group
So I wrote this query below:
SELECT
[Period]
,[Region]
,[Country]
,[Manufacturer]
,[Brand]
,SUM([Spend]) OVER (PARTITION BY [Period]) AS [SumOfSpendWorld]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region]) AS [SumOfSpendRegion]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country]) AS [SumOfSpendCountry]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
But that query takes >15 minutes for a table of just 450K rows. I'd like to know if there is any way to optimize this performance. Thank you in advanced for your answers/suggestions!
Your description of the problem suggests grouping sets
to me:
SELECT YEAR([Period]) AS [Period], [Region], [Country], [Manufacturer],
SUM([Spend])
GROUP BY GROUPING SETS ( (YEAR([Period]),
(YEAR([Period]), [Region]),
(YEAR([Period]), [Region], [Country]),
(YEAR([Period]), [Region], [Country], [Manufacturer])
);
I don't know if this will be faster, but it certainly seems more aligned with your question.
Use cross apply here to speed the query up:
SELECT
periodyear
,[Region]
,[Country]
,[Manufacturer]
,[Brand]
,SUM([Spend]) OVER (PARTITION BY periodyear AS [SumOfSpendWorld]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region]) AS [SumOfSpendRegion]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region], [Country]) AS [SumOfSpendCountry]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
cross apply (select YEAR([Period]) periodyear) a
Old school of SUM() OVER()
:
SELECT
[Period]
, [Region]
, [Country]
, [Manufacturer]
, [Brand]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] GROUP BY [Period]) AS [SumOfSpendWorld]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region GROUP BY [Period], [Region] ) AS [SumOfSpendRegion]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country GROUP BY [Period], [Region], [Country] ) AS [SumOfSpendCountry]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country AND e.Manufacturer = t.Manufacturer GROUP BY [Period], [Region], [Country], [Manufacturer] ) AS [SumOfSpendManufacturer]
FROM myTable e
While this is not the elegant way to do it, but it gets the job done. I would highly recommend looking over the table and analyze it to see which alternative approaches would be best for your situation. If you feel it's a dead-end, then I would suggest using temp tables to make things faster. For instance, you could select the rows based on period and use bulk copy to insert them directly to the temp table, then do your magic. I've seen tables that forced me to use temp tables instead of a simple select query. Others forced me to extend the table into two tables.
So, it's not always going to be nice and clean !
I hope this would give you another insight that would help you in your journey.
来源:https://stackoverflow.com/questions/50516844/optimizing-sum-over-partition-by-for-several-hierarchical-groups