The query is executing very slowly, is there any way to improve it any further?

后端未结

关注

 8  1131

I have the following query, and because of a lot of SUM function calls, my query is running too slow. I have a lot of records in my database and I would like to get

相关标签:

8条回答

情书的邮戳

2021-02-15 17:47
I would use a lookup table "Dates" table to join my data to with an index on DatesId. I use the dates as a filter when I want to browse historical data. The join is fast and so it the filtering as the DatesId is clustered primary index (primary key). Add the date column (as included column) for your data table as well.

The dates table has the following columns:

DatesId, Date, Year, Quarter, YearQuarter, MonthNum, MonthNameShort, YearWeek, WeekNum, DayOfYear, DayOfMonth, DayNumOfWeek, DayName

Example data: 20310409 2031-04-09 2031 2 2031-Q2 4 April Apr 2031_15 15 99 9 3 Wednesday

You can PM me if you want a csv of this so that you can import it to the database, but I'm sure you can easily find something like this online and make your own.

I add an identity column as well so that you can get an integer for each date. This makes it a bit easier to work with, but not a requirement.
```
SELECT * FROM dbo.dates where dateIndex BETWEEN (getDateIndexDate(getDate())-30 AND getDateIndexDate(getDate())+0) --30 days ago
```
This allows me to easily jump back to a certain period. It's quite easy to create your own views on this. You can of course use the ROW_NUMBER() function to do this for years, weeks, etc. as well.

Once I have the daterange I want, I join to the data. Works very fast!
0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2021-02-15 17:51
Just use computed colums

Example
```
ALTER TABLE tb1 ADD [Current - Last 30 Days Col1] AS (CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END) PERSISTED;
```
Specify Computed Columns in a Table
0 讨论(0)
发布评论:

提交评论
- 加载中...

悲&欢浪女

2021-02-15 17:54

For optimizing such calculations you man consider pre-calculating some of the values. The idea of pre-calculations is to reduce the number of rows that need to be read or proceed.

One way of achieving this is using an indexed view and leave the engine to do the calculations by itself. As this type of views have some limitations, you man end up creating a simple table and perform the calculations instead. Basically, it depends on the business needs.

So, in the example below I am creating a table with RowID and RowDatetime columns and inserting 1 million rows. I am using an indexed view to count the entities per days, so instead of querying 1 million rows per year I will query 365 rows per year to count these metrics.

DROP TABLE IF EXISTS [dbo].[DataSource];
GO

CREATE TABLE [dbo].[DataSource]
(
    [RowID] BIGINT IDENTITY(1,1) PRIMARY KEY
   ,[RowDateTime] DATETIME2
);

GO

DROP VIEW IF EXISTS [dbo].[vw_DataSource];
GO

CREATE VIEW [dbo].[vw_DataSource] WITH SCHEMABINDING
AS
SELECT YEAR([RowDateTime]) AS [Year]
      ,MONTH([RowDateTime]) AS [Month]
      ,DAY([RowDateTime]) AS [Day]
      ,COUNT_BIG(*) AS [Count]
FROM [dbo].[DataSource]
GROUP BY YEAR([RowDateTime])
        ,MONTH([RowDateTime])
        ,DAY([RowDateTime]);
GO

CREATE UNIQUE CLUSTERED INDEX [IX_vw_DataSource] ON [dbo].[vw_DataSource]
(
    [Year] ASC,
    [Month] ASC,
    [Day] ASC
);

GO

DECLARE @min bigint, @max bigint
SELECT @Min=1 ,@Max=1000000

INSERT INTO [dbo].[DataSource] ([RowDateTime])
SELECT TOP (@Max-@Min+1) DATEFROMPARTS(2019,  1.0 + floor(12 * RAND(convert(varbinary, newid()))), 1.0 + floor(28 * RAND(convert(varbinary, newid())))          )       
FROM master..spt_values t1 
CROSS JOIN master..spt_values t2

GO


SELECT *
FROM [dbo].[vw_DataSource]


SELECT SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(MONTH,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 30 Days Col1]
      ,SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(QUARTER,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 90 Days Col1]
      ,SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(YEAR,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 365 Days Col1]
FROM [dbo].[vw_DataSource];

The success of such solution depends very much on how the data is distributed and how many rows you have. For example, if you have one entry per day for each day of the year, the view and the table will have same match of rows, so the I/O operations will not be reduced.

Also, the above is just an example of materializing the data and reading it. In your case you may need to add more columns the view definition.

0 讨论(0)

难免孤独

2021-02-15 18:00

The best approach is to insert into a table variable/hash table (if the row count is small use a table variable or use a hash table if the row count is pretty much big). Then update the aggregation and then finally select from the table variable or hash table. Looking into the query plan is necessary.

DECLARE @MYTABLE TABLE (ID INT, [Title] VARCHAR(500), [Class] VARCHAR(500),
[Current - Last 30 Days Col1] INT, [Current - Last 30 Days Col2] INT,
[Current - Last 90 Days Col1] INT,[Current - Last 90 Days Col2] INT,
[Current - Last 365 Days Col1] INT, [Current - Last 365 Days Col2] INT,
[Last year - Last 30 Days Col1] INT, [Last year - Last 30 Days Col2] INT,
[Last year - Last 90 Days Col1] INT, [Last year - Last 90 Days Col2] INT,
[Last year - Last 365 Days Col1] INT, [Last year - Last 365 Days Col2] INT)



INSERT INTO @MYTABLE(ID, [Title],[Class], 
[Current - Last 30 Days Col1], [Current - Last 30 Days Col2],
[Current - Last 90 Days Col1], [Current - Last 90 Days Col2],
[Current - Last 365 Days Col1], [Current - Last 365 Days Col2],
[Last year - Last 30 Days Col1], [Last year - Last 30 Days Col2],
[Last year - Last 90 Days Col1], [Last year - Last 90 Days Col2],
[Last year - Last 365 Days Col1], [Last year - Last 365 Days Col2]
  )
SELECT    b.id  ,d.[Title] ,e.Class ,0,0,0,0,0,0,0,0,0,0,0,0        
FROM     tb1 a
INNER JOIN   tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN   tb3 c on b.fid = c.col5
INNER JOIN   tb4 d on c.id = d.col6
INNER JOIN  tb5 e on c.col7 = e.id
GROUP BY b.id, d.Title, e.Class

UPDATE T 
SET [Current - Last 30 Days Col1]=K.[Current - Last 30 Days Col1] , 
[Current - Last 30 Days Col2]    =K.[Current - Last 30 Days Col2],
[Current - Last 90 Days Col1]    = K.[Current - Last 90 Days Col1], 
[Current - Last 90 Days Col2]    =K.[Current - Last 90 Days Col2] ,
[Current - Last 365 Days Col1]   =K.[Current - Last 365 Days Col1], 
[Current - Last 365 Days Col2]   =K.[Current - Last 365 Days Col2],
[Last year - Last 30 Days Col1]  =K.[Last year - Last 30 Days Col1],
 [Last year - Last 30 Days Col2] =K.[Last year - Last 30 Days Col2],
[Last year - Last 90 Days Col1]  =K.[Last year - Last 90 Days Col1], 
[Last year - Last 90 Days Col2]  =K.[Last year - Last 90 Days Col2],
[Last year - Last 365 Days Col1] =K.[Last year - Last 365 Days Col1],
 [Last year - Last 365 Days Col2]=K.[Last year - Last 365 Days Col2]
    FROM @MYTABLE T JOIN 
     (
SELECT 
    b.id as [ID]
    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 30 Days Col1]
    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 30 Days Col2]

    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 90 Days Col1]
    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 90 Days Col2]

    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 365 Days Col1]
    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 365 Days Col2]

    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 30 Days Col1]
    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 30 Days Col2]

    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 90 Days Col1]
    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 90 Days Col2]

    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 365 Days Col1]
    ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 365 Days Col2]
    FROM     tb1 a
INNER JOIN   tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN   tb3 c on b.fid = c.col5
INNER JOIN   tb4 d on c.id = d.col6
INNER JOIN  tb5 e on c.col7 = e.id
GROUP BY    b.id
) AS K ON T.ID=K.ID


SELECT *
FROM @MYTABLE

0 讨论(0)

日久生厌

2021-02-15 18:00

Since you are always grouping values based on a whole number of months, I would first group by month in a subquery in the from clause. This is similar to using a temporary table. Not certain if this would actually speed up your query.

SELECT f.id, f.[Title], f.Class,
    SUM(CASE WHEN f.MonthDiff = 1 THEN col1 ELSE 0 END) as [Current - Last 30 Days Col1],
    -- etc
FROM (
    SELECT 
        b.id,
        d.[Title],
        e.Class,
        DateDiff(Month, a.DateCol, GETDATE()) as MonthDiff,
        Sum(a.col1) as col1,
        Sum(a.col2) as col2
    FROM  tb1 a
    INNER JOIN tb2 b on a.id = b.fid and a.col3 = b.col4
    INNER JOIN tb3 c on b.fid = c.col5
    INNER JOIN tb4 d on c.id = d.col6
    INNER JOIN tb5 e on c.col7 = e.id
    WHERE a.DateCol between DATEADD(YEAR,-2,GETDATE() and GETDATE()
    GROUP BY b.id, d.Title, e.Class, DateDiff(Month,  a.DateCol, GETDATE())
) f
group by f.id, f.[Title], f.Class

0 讨论(0)

遇见更好的自我

2021-02-15 18:03
As it has been mentioned already, the execution plan will be really helpful in this case. Based on what you've shown it seems you have extracted 12 columns of 15 total columns from tb1 (a), so you can try to run your query without any join and just against the tb1 to see whether your query is working as expected. Since I can see nothing wrong with your SUM function calls, my best guess is you have an issue with your joins, I would suggest to do the following. You can start by excluding the last join for instance, INNER JOIN tb5 e on c.col7 = e.id and any related usage of it like e.Class as [Class] and e.Class in your group by statement. We are not going to exclude it completely, this is just a test to make sure whether the problem is with that or not, if your query runs better and as expected you can try to use a temp table as a workaround instead of the last join, something like this:
```
SELECT *
INTO #Temp
FROM
  (
     select * from tb5
  ) As tempTable;

SELECT 
    b.id as [ID]
    ,d.[Title] as [Title]
    ,e.Class as [Class]

    -- SUM Functions

FROM 
    tb1 a
INNER JOIN 
    tb2 b on a.id=b.fid and a.col3 = b.col4
INNER JOIN 
    tb3 c on b.fid = c.col5
INNER JOIN       
    tb4 d on c.id = d.col6
INNER JOIN 
    #Temp e on c.col7 = e.id
GROUP BY
    b.id, d.Title, e.Class
```
Actually, Temporary tables are tables that exist temporarily on the SQL Server. The temporary tables are useful for storing the immediate result sets that are accessed multiple times. You can read more about it here https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/ And here https://codingsight.com/introduction-to-temporary-tables-in-sql-server/

Also I would strongly recommend, if you are using the Stored Procedure, set the NOCOUNT to ON, it can also provide a significant performance boost, because network traffic is greatly reduced:
```
SET NOCOUNT ON
SELECT *
INTO #Temp
-- The rest of code
```
Based on this:

SET NOCOUNT ON is a set statement which prevents the message which shows the number of rows affected by T-SQL query statements. This is used within stored procedures and triggers to avoid showing the affected rows message. Using SET NOCOUNT ON within a stored procedure can improve the performance of the stored procedure by a significant margin.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页