The SQL OVER() clause - when and why is it useful?

后端未结

关注

 8  1216

    USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS \'Total\'
    ,AVG(OrderQty) OVER(PAR


                      
              相关标签:


      
      
        
          8条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2020-11-28 18:12
              
            
            
                                                                       
You can use GROUP BY SalesOrderID. The difference is, with GROUP BY you can only have the aggregated values for the columns that are not included in GROUP BY.

In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values. That is, although you are not doing that in your example query, you could retrieve both individual OrderQty values and their sums, counts, averages etc. over groups of same SalesOrderIDs.

Here's a practical example of why windowed aggregates are great. Suppose you need to calculate what percent of a total every value is. Without windowed aggregates you'd have to first derive a list of aggregated values and then join it back to the original rowset, i.e. like this:

SELECT
  orig.[Partition],
  orig.Value,
  orig.Value * 100.0 / agg.TotalValue AS ValuePercent
FROM OriginalRowset orig
  INNER JOIN (
    SELECT
      [Partition],
      SUM(Value) AS TotalValue
    FROM OriginalRowset
    GROUP BY [Partition]
  ) agg ON orig.[Partition] = agg.[Partition]


Now look how you can do the same with a windowed aggregate:

SELECT
  [Partition],
  Value,
  Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercent
FROM OriginalRowset orig


Much easier and cleaner, isn't it?
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-11-28 18:12
              
            
            
                                                                       
So in simple words:
Over clause can be used to select non aggregated values along with Aggregated ones.
Partition BY, ORDER BY inside, and ROWS or RANGE are part of OVER() by clause.
partition by is used to partition data and then perform these window, aggregated functions, and if we don't have partition by the then entire result set is considered as a single partition.
OVER clause can be used with Ranking Functions(Rank, Row_Number, Dense_Rank..), Aggregate Functions like (AVG, Max, Min, SUM...etc) and Analytics Functions like (First_Value, Last_Value, and few others).
Let's See basic syntax of OVER clause
OVER (   
       [ <PARTITION BY clause> ]  
       [ <ORDER BY clause> ]   
       [ <ROW or RANGE clause> ]  
      )  

PARTITION BY:
It is used to partition data and perform operations on groups with the same data.
ORDER BY:
It is used to define the logical order of data in Partitions. When we don't specify Partition, entire resultset is considered as a single partition
:
This can be used to specify what rows are supposed to be considered in a partition when performing the operation.
Let's take an example:
Here is my dataset:
Id          Name                                               Gender     Salary
----------- -------------------------------------------------- ---------- -----------
1           Mark                                               Male       5000
2           John                                               Male       4500
3           Pavan                                              Male       5000
4           Pam                                                Female     5500
5           Sara                                               Female     4000
6           Aradhya                                            Female     3500
7           Tom                                                Male       5500
8           Mary                                               Female     5000
9           Ben                                                Male       6500
10          Jodi                                               Female     7000
11          Tom                                                Male       5500
12          Ron                                                Male       5000

So let me execute different scenarios and see how data is impacted and I'll come from difficult syntax to simple one
Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

Just observe the sum_sal part. Here I am using order by Salary and using "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW".
In this case, we are not using partition so entire data will be treated as one partition and we are ordering on salary.
And the important thing here is UNBOUNDED PRECEDING AND CURRENT ROW. This means when we are calculating the sum, from starting row to the current row for each row.
But if we see rows with salary 5000 and name="Pavan", ideally it should be 17000 and for salary=5000 and name=Mark, it should be 22000. But as we are using RANGE and in this case, if it finds any similar elements then it considers them as the same logical group and performs an operation on them and assigns value to each item in that group. That is the reason why we have the same value for salary=5000. The engine went up to salary=5000 and Name=Ron and calculated sum and then assigned it to all salary=5000.
Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees


   Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        17000
1           Mark                                               Male       5000        22000
8           Mary                                               Female     5000        27000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        37500
7           Tom                                                Male       5500        43000
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

So with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW The difference is for same value items instead of grouping them together, It calculates SUM from starting row to current row and it doesn't treat items with same value differently like RANGE
Select *,SUM(salary) Over(order by salary) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

These results are the same as
Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

That is because Over(order by salary) is just a short cut of Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
So wherever we simply specify Order by without ROWS or RANGE it is taking RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as default.
Note: This is applicable only to Functions that actually accept RANGE/ROW. For example, ROW_NUMBER and few others don't accept RANGE/ROW and in that case, this doesn't come into the picture.
Till now we saw that Over clause with an order by is taking Range/ROWS and syntax looks something like this RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
And it is actually calculating up to the current row from the first row. But what If it wants to calculate values for the entire partition of data and have it for each column (that is from 1st row to last row). Here is the query for that
Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000

Instead of CURRENT ROW, I am specifying UNBOUNDED FOLLOWING which instructs the engine to calculate till the last record of partition for each row.
Now coming to your point on what is OVER() with empty braces?
It is just a short cut for Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
Here we are indirectly specifying to treat all my resultset as a single partition and then perform calculations from the first record to the last record of each partition.
Select *,Sum(salary) Over() as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000

I did create a video on this and if you are interested you can visit it.
https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s
Thanks,
Pavan Kumar Aryasomayajulu
HTTP://xyzcoder.github.io
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北荒        
                
              
                            
                2020-11-28 18:13
              
            
            
                                                                       
The OVER clause when combined with PARTITION BY state that the preceding function call must be done analytically by evaluating the returned rows of the query.  Think of it as an inline GROUP BY statement.

OVER (PARTITION BY SalesOrderID) is stating that for SUM, AVG, etc... function, return the value OVER a subset of the returned records from the query, and PARTITION that subset BY the foreign key SalesOrderID.

So we will SUM every OrderQty record for EACH UNIQUE SalesOrderID, and that column name will be called 'Total'.

It is a MUCH more efficient means than using multiple inline views to find out the same information.  You can put this query within an inline view and filter on Total then.

SELECT ...,
FROM (your query) inlineview
WHERE Total < 200

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-11-28 18:15
              
            
            
                                                                       
Let me explain with an example and you would be able to see how it works.

Assuming you have the following table DIM_EQUIPMENT:

VIN         MAKE    MODEL   YEAR    COLOR
-----------------------------------------
1234ASDF    Ford    Taurus  2008    White
1234JKLM    Chevy   Truck   2005    Green
5678ASDF    Ford    Mustang 2008    Yellow


Run below SQL

SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR ,
  COUNT(*) OVER (PARTITION BY YEAR) AS COUNT2
FROM DIM_EQUIPMENT


The result would be as below

VIN         MAKE    MODEL   YEAR    COLOR     COUNT2
 ----------------------------------------------  
1234JKLM    Chevy   Truck   2005    Green     1
5678ASDF    Ford    Mustang 2008    Yellow    2
1234ASDF    Ford    Taurus  2008    White     2


See what happened.

You are able to count without Group By on YEAR and Match with ROW.

Another Interesting WAY to get same result if as below using WITH Clause, WITH works as in-line VIEW and can simplify the query especially complex ones, which is not the case here though since I am just trying to show usage

 WITH EQ AS
  ( SELECT YEAR AS YEAR2, COUNT(*) AS COUNT2 FROM DIM_EQUIPMENT GROUP BY YEAR
  )
SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR,
  COUNT2
FROM DIM_EQUIPMENT,
  EQ
WHERE EQ.YEAR2=DIM_EQUIPMENT.YEAR;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  耶瑟儿～        
                
              
                            
                2020-11-28 18:16
              
            
            
                                                                       

Also Called Query Petition Clause.
Similar to the Group By Clause


break up data into chunks (or partitions)
separate by partition bounds
function performs within partitions
re-initialised when crossing parting boundary



Syntax:

function (...) OVER (PARTITION BY col1 col3,...)


Functions


Familiar functions such as COUNT(), SUM(), MIN(), MAX(), etc
New Functions as well (eg ROW_NUMBER(), RATION_TO_REOIRT(), etc.)





More info with example : http://msdn.microsoft.com/en-us/library/ms189461.aspx
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2020-11-28 18:25
              
            
            
                                                                       
If you only wanted to GROUP BY the SalesOrderID then you wouldn't be able to include the ProductID and OrderQty columns in the SELECT clause.

The PARTITION BY clause let's you break up your aggregate functions. One obvious and useful example would be if you wanted to generate line numbers for order lines on an order:

SELECT
    O.order_id,
    O.order_date,
    ROW_NUMBER() OVER(PARTITION BY O.order_id) AS line_item_no,
    OL.product_id
FROM
    Orders O
INNER JOIN Order_Lines OL ON OL.order_id = O.order_id


(My syntax might be off slightly)

You would then get back something like:

order_id    order_date    line_item_no    product_id
--------    ----------    ------------    ----------
    1       2011-05-02         1              5
    1       2011-05-02         2              4
    1       2011-05-02         3              7
    2       2011-05-12         1              8
    2       2011-05-12         2              1

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复