DISTINCT with PARTITION BY vs. GROUPBY

后端 未结 3 1610
渐次进展
渐次进展 2021-02-01 10:33

I have found some SQL queries in an application I am examining like this:

SELECT DISTINCT
Company, Warehouse, Item,
SUM(quantity) OVER (PARTITION BY Company, War         


        
3条回答
  •  温柔的废话
    2021-02-01 11:30

    Performance:

    Winner: GROUP BY

    Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for PARTITION BY was significantly slower.

    The GROUP BY query plan included only a table scan and aggregation operation while the PARTITION BY plan had two nested loop self-joins. The PARTITION BY took about 2800ms on the second run, the GROUP BY took only 500ms.

    Readability / Maintainability:

    Winner: GROUP BY

    Based on the opinions of the commenters here the PARTITION BY is less readable for most developers so it will be probably also harder to maintain in the future.

    Flexibility

    Winner: PARTITION BY

    PARTITION BY gives you more flexibility in choosing the grouping columns. With GROUP BY you can have only one set of grouping columns for all aggregated columns. With DISTINCT + PARTITION BY you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the OVER clause.

提交回复
热议问题