Please help me to generate the following query. Say I have customer table and order table.
Customer Table
CustID CustName
1 AA
2 BB
Here is my version. I really was presenting this as a mere curiosity, to show another way of thinking about the problem. It turned out to be more useful than that because it performed better than even Martin Smith's cool "grouped islands" solution. Though, once he got rid of some overly expensive aggregate windowing functions and did real aggregates instead, his query started kicking butt.
Solution 1: Runs of 3 months or more, done by checking 1 month ahead and behind and using a semi-join against that.
WITH Months AS (
SELECT DISTINCT
O.CustID,
Grp = DateDiff(Month, '20000101', O.OrderDate)
FROM
CustOrder O
), Anchors AS (
SELECT
M.CustID,
Ind = M.Grp + X.Offset
FROM
Months M
CROSS JOIN (
SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1
) X (Offset)
GROUP BY
M.CustID,
M.Grp + X.Offset
HAVING
Count(*) = 3
)
SELECT
C.CustName,
[Year] = Year(OrderDate),
O.OrderDate
FROM
Cust C
INNER JOIN CustOrder O ON C.CustID = O.CustID
WHERE
EXISTS (
SELECT 1
FROM
Anchors A
WHERE
O.CustID = A.CustID
AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201')
AND O.OrderDate < DateAdd(Month, A.Ind, '20000301')
)
ORDER BY
C.CustName,
OrderDate;
Solution 2: Exact 3-month patterns. If it is a 4-month or greater run, the values are excluded. This is done by checking 2 months ahead and two months behind (essentially looking for the pattern N, Y, Y, Y, N).
WITH Months AS (
SELECT DISTINCT
O.CustID,
Grp = DateDiff(Month, '20000101', O.OrderDate)
FROM
CustOrder O
), Anchors AS (
SELECT
M.CustID,
Ind = M.Grp + X.Offset
FROM
Months M
CROSS JOIN (
SELECT -2 UNION ALL SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2
) X (Offset)
GROUP BY
M.CustID,
M.Grp + X.Offset
HAVING
Count(*) = 3
AND Min(X.Offset) = -1
AND Max(X.Offset) = 1
)
SELECT
C.CustName,
[Year] = Year(OrderDate),
O.OrderDate
FROM
Cust C
INNER JOIN CustOrder O ON C.CustID = O.CustID
INNER JOIN Anchors A
ON O.CustID = A.CustID
AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201')
AND O.OrderDate < DateAdd(Month, A.Ind, '20000301')
ORDER BY
C.CustName,
OrderDate;
Here's my table-loading script if anyone else wants to play:
IF Object_ID('CustOrder', 'U') IS NOT NULL DROP TABLE CustOrder
IF Object_ID('Cust', 'U') IS NOT NULL DROP TABLE Cust
GO
SET NOCOUNT ON
CREATE TABLE Cust (
CustID int identity(1,1) NOT NULL PRIMARY KEY CLUSTERED,
CustName varchar(100) UNIQUE
)
CREATE TABLE CustOrder (
OrderID int identity(100, 1) NOT NULL PRIMARY KEY CLUSTERED,
CustID int NOT NULL FOREIGN KEY REFERENCES Cust (CustID),
OrderDate smalldatetime NOT NULL
)
DECLARE @i int
SET @i = 1000
WHILE @i > 0 BEGIN
WITH N AS (
SELECT
Nm =
Char(Abs(Checksum(NewID())) % 26 + 65)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
+ Char(Abs(Checksum(NewID())) % 26 + 97)
)
INSERT Cust
SELECT N.Nm
FROM N
WHERE NOT EXISTS (
SELECT 1
FROM Cust C
WHERE
N.Nm = C.CustName
)
SET @i = @i - @@RowCount
END
WHILE @i < 50000 BEGIN
INSERT CustOrder
SELECT TOP (50000 - @i)
Abs(Checksum(NewID())) % 1000 + 1,
DateAdd(Day, Abs(Checksum(NewID())) % 10000, '19900101')
FROM master.dbo.spt_values
SET @i = @i + @@RowCount
END
Performance
Here are some performance testing results for the 3-month-or-more queries:
Query CPU Reads Duration
Martin 1 2297 299412 2348
Martin 2 625 285 809
Denis 3641 401 3855
Erik 1855 94727 2077
This is only one run of each, but the numbers are fairly representative. It turns out that your query wasn't so badly-performing, Denis, after all. Martin's query beats the others hands down, but at first was using some overly-expensive windowing functions strategies that he fixed.
Of course, as I noted, Denis's query isn't pulling the right rows when a customer has two orders on the same day, so his query is out of contention unless he fixed is.
Also, different indexes could possibly shake things up. I don't know.
Here is my take.
select 100 as OrderID,convert(datetime,'01-JAN-2000') OrderDate, 1 as CustID into #tmp union
select 101,convert(datetime,'05-FEB-2000'), 1 union
select 102,convert(datetime,'10-MAR-2000'), 1 union
select 103,convert(datetime,'01-NOV-2000'), 2 union
select 104,convert(datetime,'05-APR-2001'), 2 union
select 105,convert(datetime,'07-MAR-2002'), 2 union
select 106,convert(datetime,'01-JUL-2003'), 1 union
select 107,convert(datetime,'01-SEP-2004'), 4 union
select 108,convert(datetime,'01-APR-2005'), 4 union
select 109,convert(datetime,'01-MAY-2006'), 3 union
select 110,convert(datetime,'05-MAY-2007'), 1 union
select 111,convert(datetime,'07-JUN-2007'), 1 union
select 112,convert(datetime,'06-JUL-2007'), 1
;with cte as
(
select
*
,convert(int,convert(char(6),orderdate,112)) - dense_rank() over(partition by custid order by orderdate) as g
from #tmp
),
cte2 as
(
select
CustID
,g
from cte a
group by CustID, g
having count(g)>=3
)
select
a.CustID
,Yr=Year(OrderDate)
,OrderDate
from cte2 a join cte b
on a.CustID=b.CustID and a.g=b.g
Here you go:
select distinct
CustName
,year(OrderDate) [Year]
,OrderDate
from
(
select
o2.OrderDate [prev]
,o1.OrderDate [curr]
,o3.OrderDate [next]
,c.CustName
from [order] o1
join [order] o2 on o1.CustId = o2.CustId and datediff(mm, o2.OrderDate, o1.OrderDate) = 1
join [order] o3 on o1.CustId = o3.CustId and o2.OrderId <> o3.OrderId and datediff(mm, o3.OrderDate, o1.OrderDate) = -1
join Customer c on c.CustId = o1.CustId
) t
unpivot
(
OrderDate for [DateName] in ([prev], [curr], [next])
)
unpvt
order by CustName, OrderDate
Edit: Got rid or the MAX() OVER (PARTITION BY ...)
as that seemed to kill performance.
;WITH cte AS (
SELECT CustID ,
OrderDate,
DATEPART(YEAR, OrderDate)*12 + DATEPART(MONTH, OrderDate) AS YM
FROM Orders
),
cte1 AS (
SELECT CustID ,
OrderDate,
YM,
YM - DENSE_RANK() OVER (PARTITION BY CustID ORDER BY YM) AS G
FROM cte
),
cte2 As
(
SELECT CustID ,
MIN(OrderDate) AS Mn,
MAX(OrderDate) AS Mx
FROM cte1
GROUP BY CustID, G
HAVING MAX(YM)-MIN(YM) >=2
)
SELECT c.CustName, o.OrderDate, YEAR(o.OrderDate) AS YEAR
FROM Customers AS c INNER JOIN
Orders AS o ON c.CustID = o.CustID
INNER JOIN cte2 c2 ON c2.CustID = o.CustID and o.OrderDate between Mn and Mx
order by c.CustName, o.OrderDate