Trouble using ROW_NUMBER() OVER (PARTITION BY …)

后端未结

关注

 3  725

I\'m using SQL Server 2008 R2. I have table called EmployeeHistory with the following structure and sample data:

EmployeeID Date      DepartmentID SupervisorID
1


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-02-02 15:22
              
            
            
                                                                       
It looks like a common gaps-and-islands problem. The difference between two sequences of row numbers rn1 and rn2 give the "group" number.

Run this query CTE-by-CTE and examine intermediate results to see how it works.

Sample data

I expanded sample data from the question a little.

DECLARE @Source TABLE
(
    EmployeeID int,
    DateStarted date,
    DepartmentID int
)

INSERT INTO @Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001),

(10005,'2013-05-01',001),
(10005,'2013-11-09',001),
(10005,'2013-12-01',002),
(10005,'2014-10-01',001),
(10005,'2016-12-01',001);


Query for SQL Server 2008

There is no LEAD function in SQL Server 2008, so I had to use self-join via OUTER APPLY to get the value of the "next" row for the DateEnd.

WITH
CTE
AS
(
    SELECT
        EmployeeID
        ,DateStarted
        ,DepartmentID
        ,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY DateStarted) AS rn2
    FROM @Source
)
,CTE_Groups
AS
(
    SELECT
        EmployeeID
        ,MIN(DateStarted) AS DateStart
        ,DepartmentID
    FROM CTE
    GROUP BY
        EmployeeID
        ,DepartmentID
        ,rn1 - rn2
)
SELECT
    CTE_Groups.EmployeeID
    ,CTE_Groups.DepartmentID
    ,CTE_Groups.DateStart
    ,A.DateEnd
FROM
    CTE_Groups
    OUTER APPLY
    (
        SELECT TOP(1) G2.DateStart AS DateEnd
        FROM CTE_Groups AS G2
        WHERE
            G2.EmployeeID = CTE_Groups.EmployeeID
            AND G2.DateStart > CTE_Groups.DateStart
        ORDER BY G2.DateStart
    ) AS A
ORDER BY
    EmployeeID
    ,DateStart
;


Query for SQL Server 2012+

Starting with SQL Server 2012 there is a LEAD function that makes this task more efficient.

WITH
CTE
AS
(
    SELECT
        EmployeeID
        ,DateStarted
        ,DepartmentID
        ,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY DateStarted) AS rn2
    FROM @Source
)
,CTE_Groups
AS
(
    SELECT
        EmployeeID
        ,MIN(DateStarted) AS DateStart
        ,DepartmentID
    FROM CTE
    GROUP BY
        EmployeeID
        ,DepartmentID
        ,rn1 - rn2
)
SELECT
    CTE_Groups.EmployeeID
    ,CTE_Groups.DepartmentID
    ,CTE_Groups.DateStart
    ,LEAD(CTE_Groups.DateStart) OVER (PARTITION BY CTE_Groups.EmployeeID ORDER BY CTE_Groups.DateStart) AS DateEnd
FROM
    CTE_Groups
ORDER BY
    EmployeeID
    ,DateStart
;


Result

+------------+--------------+------------+------------+
| EmployeeID | DepartmentID | DateStart  |  DateEnd   |
+------------+--------------+------------+------------+
|      10001 |            1 | 2013-01-01 | 2013-12-01 |
|      10001 |            2 | 2013-12-01 | 2014-10-01 |
|      10001 |            1 | 2014-10-01 | NULL       |
|      10005 |            1 | 2013-05-01 | 2013-12-01 |
|      10005 |            2 | 2013-12-01 | 2014-10-01 |
|      10005 |            1 | 2014-10-01 | NULL       |
+------------+--------------+------------+------------+

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2021-02-02 15:28
              
            
            
                                                                       
A bit involved. Easiest would be to refer to this SQL Fiddle I created for you that produces the exact result. There are ways you can improve it for performance or other considerations, but this should hopefully at least be clearer than some alternatives.

The gist is, you get a canonical ranking of your data first, then use that to segment the data into groups, then find an end date for each group, then eliminate any intermediate rows. ROW_NUMBER() and CROSS APPLY help a lot in doing it readably.



EDIT 2019:

The SQL Fiddle does in fact seem to be broken, for some reason, but it appears to be a problem on the SQL Fiddle site. Here's a complete version, tested just now on SQL Server 2016:

CREATE TABLE Source
(
  EmployeeID int,
  DateStarted date,
  DepartmentID int
)

INSERT INTO Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001)


SELECT *, 
  ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS EntryRank,
  newid() as GroupKey,
  CAST(NULL AS date) AS EndDate
INTO #RankedData
FROM Source
;

UPDATE #RankedData
SET GroupKey = beginDate.GroupKey
FROM #RankedData sup
  CROSS APPLY 
  (
    SELECT TOP 1 GroupKey
    FROM #RankedData sub 
    WHERE sub.EmployeeID = sup.EmployeeID AND
      sub.DepartmentID = sup.DepartmentID AND
      NOT EXISTS 
        (
          SELECT * 
          FROM #RankedData bot 
          WHERE bot.EmployeeID = sup.EmployeeID AND
            bot.EntryRank BETWEEN sub.EntryRank AND sup.EntryRank AND
            bot.DepartmentID <> sup.DepartmentID
        )
      ORDER BY DateStarted ASC
    ) beginDate (GroupKey);

UPDATE #RankedData
SET EndDate = nextGroup.DateStarted
FROM #RankedData sup
  CROSS APPLY 
  (
    SELECT TOP 1 DateStarted
    FROM #RankedData sub
    WHERE sub.EmployeeID = sup.EmployeeID AND
      sub.DepartmentID <> sup.DepartmentID AND
      sub.EntryRank > sup.EntryRank
    ORDER BY EntryRank ASC
  ) nextGroup (DateStarted);

SELECT * FROM 
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupKey ORDER BY EntryRank ASC) AS GroupRank FROM #RankedData
) FinalRanking
WHERE GroupRank = 1
ORDER BY EntryRank;

DROP TABLE #RankedData
DROP TABLE Source

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2021-02-02 15:31
              
            
            
                                                                       
I would do something like this:

;WITH x 
 AS (SELECT *, 
            Row_number() 
              OVER( 
                partition BY employeeid 
                ORDER BY datestart) rn 
     FROM   employeehistory) 
SELECT * 
FROM   x x1 
   LEFT OUTER JOIN x x2 
                ON x1.rn = x2.rn + 1 


Or maybe it would be x2.rn - 1. You'll have to see. In any case, you get the idea. Once you have the table joined on itself, you can filter, group, sort, etc. to get what you need.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复