TSQL Finding Order that occurred in 3 consecutive months

前端 未结 4 703
悲哀的现实
悲哀的现实 2021-01-05 00:56

Please help me to generate the following query. Say I have customer table and order table.

Customer Table

CustID CustName

1      AA     
2      BB
         


        
相关标签:
4条回答
  • 2021-01-05 01:20

    Here is my version. I really was presenting this as a mere curiosity, to show another way of thinking about the problem. It turned out to be more useful than that because it performed better than even Martin Smith's cool "grouped islands" solution. Though, once he got rid of some overly expensive aggregate windowing functions and did real aggregates instead, his query started kicking butt.

    Solution 1: Runs of 3 months or more, done by checking 1 month ahead and behind and using a semi-join against that.

    WITH Months AS (
       SELECT DISTINCT
          O.CustID,
          Grp = DateDiff(Month, '20000101', O.OrderDate)
       FROM
          CustOrder O
    ), Anchors AS (
       SELECT
          M.CustID,
          Ind = M.Grp + X.Offset
       FROM
          Months M
          CROSS JOIN (
             SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1
          ) X (Offset)
       GROUP BY
          M.CustID,
          M.Grp + X.Offset
       HAVING
          Count(*) = 3
    )
    SELECT
       C.CustName,
       [Year] = Year(OrderDate),
       O.OrderDate
    FROM
       Cust C
       INNER JOIN CustOrder O ON C.CustID = O.CustID
    WHERE
       EXISTS (
          SELECT 1
          FROM
             Anchors A
          WHERE
             O.CustID = A.CustID
             AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201')
             AND O.OrderDate < DateAdd(Month, A.Ind, '20000301')
       )
    ORDER BY
       C.CustName,
       OrderDate;
    

    Solution 2: Exact 3-month patterns. If it is a 4-month or greater run, the values are excluded. This is done by checking 2 months ahead and two months behind (essentially looking for the pattern N, Y, Y, Y, N).

    WITH Months AS (
       SELECT DISTINCT
          O.CustID,
          Grp = DateDiff(Month, '20000101', O.OrderDate)
       FROM
          CustOrder O
    ), Anchors AS (
       SELECT
          M.CustID,
          Ind = M.Grp + X.Offset
       FROM
          Months M
          CROSS JOIN (
             SELECT -2 UNION ALL SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2
          ) X (Offset)
       GROUP BY
          M.CustID,
          M.Grp + X.Offset
       HAVING
          Count(*) = 3
          AND Min(X.Offset) = -1
          AND Max(X.Offset) = 1
    )
    SELECT
       C.CustName,
       [Year] = Year(OrderDate),
       O.OrderDate
    FROM
       Cust C
       INNER JOIN CustOrder O ON C.CustID = O.CustID
       INNER JOIN Anchors A
          ON O.CustID = A.CustID
          AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201')
          AND O.OrderDate < DateAdd(Month, A.Ind, '20000301')
    ORDER BY
       C.CustName,
       OrderDate;
    

    Here's my table-loading script if anyone else wants to play:

    IF Object_ID('CustOrder', 'U') IS NOT NULL DROP TABLE CustOrder
    IF Object_ID('Cust', 'U') IS NOT NULL DROP TABLE Cust
    GO
    SET NOCOUNT ON
    CREATE TABLE Cust (
      CustID int identity(1,1) NOT NULL PRIMARY KEY CLUSTERED,
      CustName varchar(100) UNIQUE
    )
    
    CREATE TABLE CustOrder (
       OrderID int identity(100, 1) NOT NULL PRIMARY KEY CLUSTERED,
       CustID int NOT NULL FOREIGN KEY REFERENCES Cust (CustID),
       OrderDate smalldatetime NOT NULL
    )
    
    DECLARE @i int
    SET @i = 1000
    WHILE @i > 0 BEGIN
       WITH N AS (
          SELECT
             Nm =
                Char(Abs(Checksum(NewID())) % 26 + 65)
                + Char(Abs(Checksum(NewID())) % 26 + 97)
                + Char(Abs(Checksum(NewID())) % 26 + 97)
                + Char(Abs(Checksum(NewID())) % 26 + 97)
                + Char(Abs(Checksum(NewID())) % 26 + 97)
                + Char(Abs(Checksum(NewID())) % 26 + 97)
       )
       INSERT Cust
       SELECT N.Nm
       FROM N
       WHERE NOT EXISTS (
          SELECT 1
          FROM Cust C
          WHERE
             N.Nm = C.CustName
       )
    
       SET @i = @i - @@RowCount
    END
    WHILE @i < 50000 BEGIN
       INSERT CustOrder
       SELECT TOP (50000 - @i)
          Abs(Checksum(NewID())) % 1000 + 1,
          DateAdd(Day, Abs(Checksum(NewID())) % 10000, '19900101')
       FROM master.dbo.spt_values
       SET @i = @i + @@RowCount
    END
    

    Performance

    Here are some performance testing results for the 3-month-or-more queries:

    Query     CPU   Reads Duration
    Martin 1  2297 299412   2348 
    Martin 2   625    285    809
    Denis     3641    401   3855
    Erik      1855  94727   2077
    

    This is only one run of each, but the numbers are fairly representative. It turns out that your query wasn't so badly-performing, Denis, after all. Martin's query beats the others hands down, but at first was using some overly-expensive windowing functions strategies that he fixed.

    Of course, as I noted, Denis's query isn't pulling the right rows when a customer has two orders on the same day, so his query is out of contention unless he fixed is.

    Also, different indexes could possibly shake things up. I don't know.

    0 讨论(0)
  • 2021-01-05 01:23

    Here is my take.

    select 100 as OrderID,convert(datetime,'01-JAN-2000') OrderDate,    1  as CustID  into #tmp union
        select 101,convert(datetime,'05-FEB-2000'),        1 union
        select 102,convert(datetime,'10-MAR-2000'),        1 union
        select 103,convert(datetime,'01-NOV-2000'),        2 union   
        select 104,convert(datetime,'05-APR-2001'),        2 union
        select 105,convert(datetime,'07-MAR-2002'),        2 union
        select 106,convert(datetime,'01-JUL-2003'),        1 union
        select 107,convert(datetime,'01-SEP-2004'),        4 union
        select 108,convert(datetime,'01-APR-2005'),        4 union
        select 109,convert(datetime,'01-MAY-2006'),        3 union
        select 110,convert(datetime,'05-MAY-2007'),        1 union 
        select 111,convert(datetime,'07-JUN-2007'),        1 union
        select 112,convert(datetime,'06-JUL-2007'),        1 
    
    
        ;with cte as
        (
            select
                *   
                ,convert(int,convert(char(6),orderdate,112)) - dense_rank() over(partition by custid order by orderdate) as g
            from #tmp
        ),
        cte2 as 
        (
        select 
            CustID
            ,g  
        from cte a
        group by CustID, g
        having count(g)>=3
        )
        select
            a.CustID
            ,Yr=Year(OrderDate)
            ,OrderDate
        from cte2 a join cte b
            on a.CustID=b.CustID and a.g=b.g
    
    0 讨论(0)
  • 2021-01-05 01:30

    Here you go:

    select distinct
     CustName
    ,year(OrderDate) [Year]
    ,OrderDate
    from 
    (
    select 
     o2.OrderDate [prev]
    ,o1.OrderDate [curr]
    ,o3.OrderDate [next]
    ,c.CustName
    from [order] o1 
    join [order] o2 on o1.CustId = o2.CustId and datediff(mm, o2.OrderDate, o1.OrderDate) = 1
    join [order] o3 on o1.CustId = o3.CustId and o2.OrderId <> o3.OrderId and datediff(mm, o3.OrderDate, o1.OrderDate) = -1
    join Customer c on c.CustId = o1.CustId
    ) t
    unpivot
    (
        OrderDate for [DateName] in ([prev], [curr], [next])
    )
    unpvt
    order by CustName, OrderDate
    
    0 讨论(0)
  • 2021-01-05 01:33

    Edit: Got rid or the MAX() OVER (PARTITION BY ...) as that seemed to kill performance.

    ;WITH cte AS ( 
    SELECT    CustID  ,
              OrderDate,
              DATEPART(YEAR, OrderDate)*12 + DATEPART(MONTH, OrderDate) AS YM
     FROM     Orders
     ),
     cte1 AS ( 
    SELECT    CustID  ,
              OrderDate,
              YM,
              YM - DENSE_RANK() OVER (PARTITION BY CustID ORDER BY YM) AS G
     FROM     cte
     ),
     cte2 As
     (
     SELECT CustID  ,
              MIN(OrderDate) AS Mn,
              MAX(OrderDate) AS Mx
     FROM cte1
    GROUP BY CustID, G
    HAVING MAX(YM)-MIN(YM) >=2 
     )
    SELECT     c.CustName, o.OrderDate, YEAR(o.OrderDate) AS YEAR
    FROM         Customers AS c INNER JOIN
                          Orders AS o ON c.CustID = o.CustID
    INNER JOIN  cte2 c2 ON c2.CustID = o.CustID and o.OrderDate between Mn and Mx
    order by c.CustName, o.OrderDate
    
    0 讨论(0)
提交回复
热议问题