I have the following query, and because of a lot of SUM
function calls, my query is running too slow. I have a lot of records in my database and I would like to get
I would use a lookup table "Dates" table to join my data to with an index on DatesId. I use the dates as a filter when I want to browse historical data. The join is fast and so it the filtering as the DatesId is clustered primary index (primary key). Add the date column (as included column) for your data table as well.
The dates table has the following columns:
DatesId, Date, Year, Quarter, YearQuarter, MonthNum, MonthNameShort, YearWeek, WeekNum, DayOfYear, DayOfMonth, DayNumOfWeek, DayName
Example data: 20310409 2031-04-09 2031 2 2031-Q2 4 April Apr 2031_15 15 99 9 3 Wednesday
You can PM me if you want a csv of this so that you can import it to the database, but I'm sure you can easily find something like this online and make your own.
I add an identity column as well so that you can get an integer for each date. This makes it a bit easier to work with, but not a requirement.
SELECT * FROM dbo.dates where dateIndex BETWEEN (getDateIndexDate(getDate())-30 AND getDateIndexDate(getDate())+0) --30 days ago
This allows me to easily jump back to a certain period. It's quite easy to create your own views on this. You can of course use the ROW_NUMBER() function to do this for years, weeks, etc. as well.
Once I have the daterange I want, I join to the data. Works very fast!
Just use computed colums
Example
ALTER TABLE tb1 ADD [Current - Last 30 Days Col1] AS (CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END) PERSISTED;
Specify Computed Columns in a Table
For optimizing such calculations you man consider pre-calculating some of the values. The idea of pre-calculations is to reduce the number of rows that need to be read or proceed.
One way of achieving this is using an indexed view and leave the engine to do the calculations by itself. As this type of views have some limitations, you man end up creating a simple table and perform the calculations instead. Basically, it depends on the business needs.
So, in the example below I am creating a table with RowID
and RowDatetime
columns and inserting 1 million rows. I am using an indexed view to count the entities per days, so instead of querying 1 million rows per year I will query 365 rows per year to count these metrics.
DROP TABLE IF EXISTS [dbo].[DataSource];
GO
CREATE TABLE [dbo].[DataSource]
(
[RowID] BIGINT IDENTITY(1,1) PRIMARY KEY
,[RowDateTime] DATETIME2
);
GO
DROP VIEW IF EXISTS [dbo].[vw_DataSource];
GO
CREATE VIEW [dbo].[vw_DataSource] WITH SCHEMABINDING
AS
SELECT YEAR([RowDateTime]) AS [Year]
,MONTH([RowDateTime]) AS [Month]
,DAY([RowDateTime]) AS [Day]
,COUNT_BIG(*) AS [Count]
FROM [dbo].[DataSource]
GROUP BY YEAR([RowDateTime])
,MONTH([RowDateTime])
,DAY([RowDateTime]);
GO
CREATE UNIQUE CLUSTERED INDEX [IX_vw_DataSource] ON [dbo].[vw_DataSource]
(
[Year] ASC,
[Month] ASC,
[Day] ASC
);
GO
DECLARE @min bigint, @max bigint
SELECT @Min=1 ,@Max=1000000
INSERT INTO [dbo].[DataSource] ([RowDateTime])
SELECT TOP (@Max-@Min+1) DATEFROMPARTS(2019, 1.0 + floor(12 * RAND(convert(varbinary, newid()))), 1.0 + floor(28 * RAND(convert(varbinary, newid()))) )
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
GO
SELECT *
FROM [dbo].[vw_DataSource]
SELECT SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(MONTH,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 30 Days Col1]
,SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(QUARTER,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 90 Days Col1]
,SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(YEAR,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 365 Days Col1]
FROM [dbo].[vw_DataSource];
The success of such solution depends very much on how the data is distributed and how many rows you have. For example, if you have one entry per day for each day of the year, the view and the table will have same match of rows, so the I/O operations will not be reduced.
Also, the above is just an example of materializing the data and reading it. In your case you may need to add more columns the view definition.
The best approach is to insert into a table variable/hash table (if the row count is small use a table variable or use a hash table if the row count is pretty much big). Then update the aggregation and then finally select from the table variable or hash table. Looking into the query plan is necessary.
DECLARE @MYTABLE TABLE (ID INT, [Title] VARCHAR(500), [Class] VARCHAR(500),
[Current - Last 30 Days Col1] INT, [Current - Last 30 Days Col2] INT,
[Current - Last 90 Days Col1] INT,[Current - Last 90 Days Col2] INT,
[Current - Last 365 Days Col1] INT, [Current - Last 365 Days Col2] INT,
[Last year - Last 30 Days Col1] INT, [Last year - Last 30 Days Col2] INT,
[Last year - Last 90 Days Col1] INT, [Last year - Last 90 Days Col2] INT,
[Last year - Last 365 Days Col1] INT, [Last year - Last 365 Days Col2] INT)
INSERT INTO @MYTABLE(ID, [Title],[Class],
[Current - Last 30 Days Col1], [Current - Last 30 Days Col2],
[Current - Last 90 Days Col1], [Current - Last 90 Days Col2],
[Current - Last 365 Days Col1], [Current - Last 365 Days Col2],
[Last year - Last 30 Days Col1], [Last year - Last 30 Days Col2],
[Last year - Last 90 Days Col1], [Last year - Last 90 Days Col2],
[Last year - Last 365 Days Col1], [Last year - Last 365 Days Col2]
)
SELECT b.id ,d.[Title] ,e.Class ,0,0,0,0,0,0,0,0,0,0,0,0
FROM tb1 a
INNER JOIN tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN tb3 c on b.fid = c.col5
INNER JOIN tb4 d on c.id = d.col6
INNER JOIN tb5 e on c.col7 = e.id
GROUP BY b.id, d.Title, e.Class
UPDATE T
SET [Current - Last 30 Days Col1]=K.[Current - Last 30 Days Col1] ,
[Current - Last 30 Days Col2] =K.[Current - Last 30 Days Col2],
[Current - Last 90 Days Col1] = K.[Current - Last 90 Days Col1],
[Current - Last 90 Days Col2] =K.[Current - Last 90 Days Col2] ,
[Current - Last 365 Days Col1] =K.[Current - Last 365 Days Col1],
[Current - Last 365 Days Col2] =K.[Current - Last 365 Days Col2],
[Last year - Last 30 Days Col1] =K.[Last year - Last 30 Days Col1],
[Last year - Last 30 Days Col2] =K.[Last year - Last 30 Days Col2],
[Last year - Last 90 Days Col1] =K.[Last year - Last 90 Days Col1],
[Last year - Last 90 Days Col2] =K.[Last year - Last 90 Days Col2],
[Last year - Last 365 Days Col1] =K.[Last year - Last 365 Days Col1],
[Last year - Last 365 Days Col2]=K.[Last year - Last 365 Days Col2]
FROM @MYTABLE T JOIN
(
SELECT
b.id as [ID]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 30 Days Col1]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 30 Days Col2]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 90 Days Col1]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 90 Days Col2]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 365 Days Col1]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 365 Days Col2]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 30 Days Col1]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 30 Days Col2]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 90 Days Col1]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 90 Days Col2]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 365 Days Col1]
,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 365 Days Col2]
FROM tb1 a
INNER JOIN tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN tb3 c on b.fid = c.col5
INNER JOIN tb4 d on c.id = d.col6
INNER JOIN tb5 e on c.col7 = e.id
GROUP BY b.id
) AS K ON T.ID=K.ID
SELECT *
FROM @MYTABLE
Since you are always grouping values based on a whole number of months, I would first group by month in a subquery in the from clause. This is similar to using a temporary table. Not certain if this would actually speed up your query.
SELECT f.id, f.[Title], f.Class,
SUM(CASE WHEN f.MonthDiff = 1 THEN col1 ELSE 0 END) as [Current - Last 30 Days Col1],
-- etc
FROM (
SELECT
b.id,
d.[Title],
e.Class,
DateDiff(Month, a.DateCol, GETDATE()) as MonthDiff,
Sum(a.col1) as col1,
Sum(a.col2) as col2
FROM tb1 a
INNER JOIN tb2 b on a.id = b.fid and a.col3 = b.col4
INNER JOIN tb3 c on b.fid = c.col5
INNER JOIN tb4 d on c.id = d.col6
INNER JOIN tb5 e on c.col7 = e.id
WHERE a.DateCol between DATEADD(YEAR,-2,GETDATE() and GETDATE()
GROUP BY b.id, d.Title, e.Class, DateDiff(Month, a.DateCol, GETDATE())
) f
group by f.id, f.[Title], f.Class
As it has been mentioned already, the execution plan will be really helpful in this case. Based on what you've shown it seems you have extracted 12 columns of 15 total columns from tb1 (a)
,
so you can try to run your query without any join and just against the tb1
to see whether your query is working as expected. Since I can see nothing wrong with your SUM function calls, my best guess is you have an issue with your joins, I would suggest to do the following. You can start by excluding the last join for instance,
INNER JOIN tb5 e on c.col7 = e.id
and any related usage of it like e.Class as [Class]
and e.Class
in your group by statement. We are not going to exclude it completely, this is just a
test to make sure whether the problem is with that or not, if your query runs better and as expected you can try to use a temp table as a workaround instead of the last join, something
like this:
SELECT *
INTO #Temp
FROM
(
select * from tb5
) As tempTable;
SELECT
b.id as [ID]
,d.[Title] as [Title]
,e.Class as [Class]
-- SUM Functions
FROM
tb1 a
INNER JOIN
tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN
tb3 c on b.fid = c.col5
INNER JOIN
tb4 d on c.id = d.col6
INNER JOIN
#Temp e on c.col7 = e.id
GROUP BY
b.id, d.Title, e.Class
Actually, Temporary tables are tables that exist temporarily on the SQL Server. The temporary tables are useful for storing the immediate result sets that are accessed multiple times. You can read more about it here https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/ And here https://codingsight.com/introduction-to-temporary-tables-in-sql-server/
Also I would strongly recommend, if you are using the Stored Procedure, set the NOCOUNT
to ON
, it can also provide a significant performance boost, because network traffic is greatly reduced:
SET NOCOUNT ON
SELECT *
INTO #Temp
-- The rest of code
Based on this:
SET NOCOUNT ON is a set statement which prevents the message which shows the number of rows affected by T-SQL query statements. This is used within stored procedures and triggers to avoid showing the affected rows message. Using SET NOCOUNT ON within a stored procedure can improve the performance of the stored procedure by a significant margin.