Optimizing SUM OVER PARTITION BY for several hierarchical groups

旧街凉风 提交于 2019-12-07 21:12:24

问题


I have a table like below:

Region    Country    Manufacturer    Brand    Period    Spend
R1        C1         M1              B1       2016      5
R1        C1         M1              B1       2017      10
R1        C1         M1              B1       2017      20
R1        C1         M1              B2       2016      15
R1        C1         M1              B3       2017      20
R1        C2         M1              B1       2017      5
R1        C2         M2              B4       2017      25
R1        C2         M2              B5       2017      30
R2        C3         M1              B1       2017      35
R2        C3         M2              B4       2017      40
R2        C3         M2              B5       2017      45

I need to find SUM([Spend] over different groups as follow:

  1. Total Spend over all the rows in the whole table
  2. Total Spend for each Region
  3. Total Spend for each Region and Country group
  4. Total Spend for each Region, Country and Advertiser group

So I wrote this query below:

SELECT 
    [Period]
    ,[Region]
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,SUM([Spend]) OVER (PARTITION BY [Period]) AS [SumOfSpendWorld]
    ,SUM([Spend]) OVER (PARTITION BY [Period], [Region]) AS [SumOfSpendRegion]
    ,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country]) AS [SumOfSpendCountry]
    ,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable

But that query takes >15 minutes for a table of just 450K rows. I'd like to know if there is any way to optimize this performance. Thank you in advanced for your answers/suggestions!


回答1:


Your description of the problem suggests grouping sets to me:

SELECT YEAR([Period]) AS [Period], [Region], [Country], [Manufacturer], 
       SUM([Spend])
GROUP BY GROUPING SETS ( (YEAR([Period]),
                         (YEAR([Period]), [Region]),
                         (YEAR([Period]), [Region], [Country]), 
                         (YEAR([Period]), [Region], [Country], [Manufacturer])
                        );

I don't know if this will be faster, but it certainly seems more aligned with your question.




回答2:


Use cross apply here to speed the query up:

 SELECT 
     periodyear
    ,[Region]
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,SUM([Spend]) OVER (PARTITION BY  periodyear AS [SumOfSpendWorld]
    ,SUM([Spend]) OVER (PARTITION BY  periodyear, [Region]) AS [SumOfSpendRegion]
    ,SUM([Spend]) OVER (PARTITION BY  periodyear, [Region], [Country]) AS [SumOfSpendCountry]
    ,SUM([Spend]) OVER (PARTITION BY  periodyear, [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
  cross apply (select YEAR([Period]) periodyear) a



回答3:


Old school of SUM() OVER():

SELECT 
      [Period]
    , [Region]
    , [Country]
    , [Manufacturer]
    , [Brand]
    , (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] GROUP BY [Period]) AS [SumOfSpendWorld]
    , (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region GROUP BY [Period], [Region] ) AS [SumOfSpendRegion]
    , (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country GROUP BY [Period], [Region], [Country] ) AS [SumOfSpendCountry]
    , (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country AND e.Manufacturer = t.Manufacturer GROUP BY [Period], [Region], [Country], [Manufacturer] ) AS [SumOfSpendManufacturer]
FROM myTable e

While this is not the elegant way to do it, but it gets the job done. I would highly recommend looking over the table and analyze it to see which alternative approaches would be best for your situation. If you feel it's a dead-end, then I would suggest using temp tables to make things faster. For instance, you could select the rows based on period and use bulk copy to insert them directly to the temp table, then do your magic. I've seen tables that forced me to use temp tables instead of a simple select query. Others forced me to extend the table into two tables.

So, it's not always going to be nice and clean !

I hope this would give you another insight that would help you in your journey.



来源:https://stackoverflow.com/questions/50516844/optimizing-sum-over-partition-by-for-several-hierarchical-groups

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!