The query is executing very slowly, is there any way to improve it any further?

后端 未结 8 1104
长情又很酷
长情又很酷 2021-02-15 17:26

I have the following query, and because of a lot of SUM function calls, my query is running too slow. I have a lot of records in my database and I would like to get

相关标签:
8条回答
  • 2021-02-15 17:47

    I would use a lookup table "Dates" table to join my data to with an index on DatesId. I use the dates as a filter when I want to browse historical data. The join is fast and so it the filtering as the DatesId is clustered primary index (primary key). Add the date column (as included column) for your data table as well.

    The dates table has the following columns:

    DatesId, Date, Year, Quarter, YearQuarter, MonthNum, MonthNameShort, YearWeek, WeekNum, DayOfYear, DayOfMonth, DayNumOfWeek, DayName

    Example data: 20310409 2031-04-09 2031 2 2031-Q2 4 April Apr 2031_15 15 99 9 3 Wednesday

    You can PM me if you want a csv of this so that you can import it to the database, but I'm sure you can easily find something like this online and make your own.

    I add an identity column as well so that you can get an integer for each date. This makes it a bit easier to work with, but not a requirement.

    SELECT * FROM dbo.dates where dateIndex BETWEEN (getDateIndexDate(getDate())-30 AND getDateIndexDate(getDate())+0) --30 days ago
    

    This allows me to easily jump back to a certain period. It's quite easy to create your own views on this. You can of course use the ROW_NUMBER() function to do this for years, weeks, etc. as well.

    Once I have the daterange I want, I join to the data. Works very fast!

    0 讨论(0)
  • 2021-02-15 17:51

    Just use computed colums

    Example

    ALTER TABLE tb1 ADD [Current - Last 30 Days Col1] AS (CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END) PERSISTED;
    

    Specify Computed Columns in a Table

    0 讨论(0)
  • 2021-02-15 17:54

    For optimizing such calculations you man consider pre-calculating some of the values. The idea of pre-calculations is to reduce the number of rows that need to be read or proceed.

    One way of achieving this is using an indexed view and leave the engine to do the calculations by itself. As this type of views have some limitations, you man end up creating a simple table and perform the calculations instead. Basically, it depends on the business needs.

    So, in the example below I am creating a table with RowID and RowDatetime columns and inserting 1 million rows. I am using an indexed view to count the entities per days, so instead of querying 1 million rows per year I will query 365 rows per year to count these metrics.

    DROP TABLE IF EXISTS [dbo].[DataSource];
    GO
    
    CREATE TABLE [dbo].[DataSource]
    (
        [RowID] BIGINT IDENTITY(1,1) PRIMARY KEY
       ,[RowDateTime] DATETIME2
    );
    
    GO
    
    DROP VIEW IF EXISTS [dbo].[vw_DataSource];
    GO
    
    CREATE VIEW [dbo].[vw_DataSource] WITH SCHEMABINDING
    AS
    SELECT YEAR([RowDateTime]) AS [Year]
          ,MONTH([RowDateTime]) AS [Month]
          ,DAY([RowDateTime]) AS [Day]
          ,COUNT_BIG(*) AS [Count]
    FROM [dbo].[DataSource]
    GROUP BY YEAR([RowDateTime])
            ,MONTH([RowDateTime])
            ,DAY([RowDateTime]);
    GO
    
    CREATE UNIQUE CLUSTERED INDEX [IX_vw_DataSource] ON [dbo].[vw_DataSource]
    (
        [Year] ASC,
        [Month] ASC,
        [Day] ASC
    );
    
    GO
    
    DECLARE @min bigint, @max bigint
    SELECT @Min=1 ,@Max=1000000
    
    INSERT INTO [dbo].[DataSource] ([RowDateTime])
    SELECT TOP (@Max-@Min+1) DATEFROMPARTS(2019,  1.0 + floor(12 * RAND(convert(varbinary, newid()))), 1.0 + floor(28 * RAND(convert(varbinary, newid())))          )       
    FROM master..spt_values t1 
    CROSS JOIN master..spt_values t2
    
    GO
    
    
    SELECT *
    FROM [dbo].[vw_DataSource]
    
    
    SELECT SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(MONTH,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 30 Days Col1]
          ,SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(QUARTER,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 90 Days Col1]
          ,SUM(CASE WHEN DATEFROMPARTS([Year], [Month], [Day]) >= DATEADD(YEAR,-1,GETDATE()) THEN [Count] ELSE 0 END) as [Current - Last 365 Days Col1]
    FROM [dbo].[vw_DataSource];
    

    The success of such solution depends very much on how the data is distributed and how many rows you have. For example, if you have one entry per day for each day of the year, the view and the table will have same match of rows, so the I/O operations will not be reduced.

    Also, the above is just an example of materializing the data and reading it. In your case you may need to add more columns the view definition.

    0 讨论(0)
  • 2021-02-15 18:00

    The best approach is to insert into a table variable/hash table (if the row count is small use a table variable or use a hash table if the row count is pretty much big). Then update the aggregation and then finally select from the table variable or hash table. Looking into the query plan is necessary.

    DECLARE @MYTABLE TABLE (ID INT, [Title] VARCHAR(500), [Class] VARCHAR(500),
    [Current - Last 30 Days Col1] INT, [Current - Last 30 Days Col2] INT,
    [Current - Last 90 Days Col1] INT,[Current - Last 90 Days Col2] INT,
    [Current - Last 365 Days Col1] INT, [Current - Last 365 Days Col2] INT,
    [Last year - Last 30 Days Col1] INT, [Last year - Last 30 Days Col2] INT,
    [Last year - Last 90 Days Col1] INT, [Last year - Last 90 Days Col2] INT,
    [Last year - Last 365 Days Col1] INT, [Last year - Last 365 Days Col2] INT)
    
    
    
    INSERT INTO @MYTABLE(ID, [Title],[Class], 
    [Current - Last 30 Days Col1], [Current - Last 30 Days Col2],
    [Current - Last 90 Days Col1], [Current - Last 90 Days Col2],
    [Current - Last 365 Days Col1], [Current - Last 365 Days Col2],
    [Last year - Last 30 Days Col1], [Last year - Last 30 Days Col2],
    [Last year - Last 90 Days Col1], [Last year - Last 90 Days Col2],
    [Last year - Last 365 Days Col1], [Last year - Last 365 Days Col2]
      )
    SELECT    b.id  ,d.[Title] ,e.Class ,0,0,0,0,0,0,0,0,0,0,0,0        
    FROM     tb1 a
    INNER JOIN   tb2 b on a.id=b.fid and a.col3 = b.col4
    INNER JOIN   tb3 c on b.fid = c.col5
    INNER JOIN   tb4 d on c.id = d.col6
    INNER JOIN  tb5 e on c.col7 = e.id
    GROUP BY b.id, d.Title, e.Class
    
    UPDATE T 
    SET [Current - Last 30 Days Col1]=K.[Current - Last 30 Days Col1] , 
    [Current - Last 30 Days Col2]    =K.[Current - Last 30 Days Col2],
    [Current - Last 90 Days Col1]    = K.[Current - Last 90 Days Col1], 
    [Current - Last 90 Days Col2]    =K.[Current - Last 90 Days Col2] ,
    [Current - Last 365 Days Col1]   =K.[Current - Last 365 Days Col1], 
    [Current - Last 365 Days Col2]   =K.[Current - Last 365 Days Col2],
    [Last year - Last 30 Days Col1]  =K.[Last year - Last 30 Days Col1],
     [Last year - Last 30 Days Col2] =K.[Last year - Last 30 Days Col2],
    [Last year - Last 90 Days Col1]  =K.[Last year - Last 90 Days Col1], 
    [Last year - Last 90 Days Col2]  =K.[Last year - Last 90 Days Col2],
    [Last year - Last 365 Days Col1] =K.[Last year - Last 365 Days Col1],
     [Last year - Last 365 Days Col2]=K.[Last year - Last 365 Days Col2]
        FROM @MYTABLE T JOIN 
         (
    SELECT 
        b.id as [ID]
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 30 Days Col1]
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 30 Days Col2]
    
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 90 Days Col1]
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 90 Days Col2]
    
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Current - Last 365 Days Col1]
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Current - Last 365 Days Col2]
    
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 30 Days Col1]
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(MONTH,-13,GETDATE()) and a.DateCol <= DATEADD(MONTH,-12,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 30 Days Col2]
    
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 90 Days Col1]
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(QUARTER,-5,GETDATE()) and a.DateCol <= DATEADD(QUARTER,-4,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 90 Days Col2]
    
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col1 ELSE 0 END),0) as [Last year - Last 365 Days Col1]
        ,ISNULL(Sum(CASE WHEN a.DateCol >= DATEADD(YEAR,-2,GETDATE()) and a.DateCol <= DATEADD(YEAR,-1,GETDATE()) THEN a.col2 ELSE 0 END),0) as [Last year - Last 365 Days Col2]
        FROM     tb1 a
    INNER JOIN   tb2 b on a.id=b.fid and a.col3 = b.col4
    INNER JOIN   tb3 c on b.fid = c.col5
    INNER JOIN   tb4 d on c.id = d.col6
    INNER JOIN  tb5 e on c.col7 = e.id
    GROUP BY    b.id
    ) AS K ON T.ID=K.ID
    
    
    SELECT *
    FROM @MYTABLE
    
    0 讨论(0)
  • 2021-02-15 18:00

    Since you are always grouping values based on a whole number of months, I would first group by month in a subquery in the from clause. This is similar to using a temporary table. Not certain if this would actually speed up your query.

    SELECT f.id, f.[Title], f.Class,
        SUM(CASE WHEN f.MonthDiff = 1 THEN col1 ELSE 0 END) as [Current - Last 30 Days Col1],
        -- etc
    FROM (
        SELECT 
            b.id,
            d.[Title],
            e.Class,
            DateDiff(Month, a.DateCol, GETDATE()) as MonthDiff,
            Sum(a.col1) as col1,
            Sum(a.col2) as col2
        FROM  tb1 a
        INNER JOIN tb2 b on a.id = b.fid and a.col3 = b.col4
        INNER JOIN tb3 c on b.fid = c.col5
        INNER JOIN tb4 d on c.id = d.col6
        INNER JOIN tb5 e on c.col7 = e.id
        WHERE a.DateCol between DATEADD(YEAR,-2,GETDATE() and GETDATE()
        GROUP BY b.id, d.Title, e.Class, DateDiff(Month,  a.DateCol, GETDATE())
    ) f
    group by f.id, f.[Title], f.Class
    
    0 讨论(0)
  • 2021-02-15 18:03

    As it has been mentioned already, the execution plan will be really helpful in this case. Based on what you've shown it seems you have extracted 12 columns of 15 total columns from tb1 (a), so you can try to run your query without any join and just against the tb1 to see whether your query is working as expected. Since I can see nothing wrong with your SUM function calls, my best guess is you have an issue with your joins, I would suggest to do the following. You can start by excluding the last join for instance, INNER JOIN tb5 e on c.col7 = e.id and any related usage of it like e.Class as [Class] and e.Class in your group by statement. We are not going to exclude it completely, this is just a test to make sure whether the problem is with that or not, if your query runs better and as expected you can try to use a temp table as a workaround instead of the last join, something like this:

    SELECT *
    INTO #Temp
    FROM
      (
         select * from tb5
      ) As tempTable;
    
    SELECT 
        b.id as [ID]
        ,d.[Title] as [Title]
        ,e.Class as [Class]
    
        -- SUM Functions
    
    FROM 
        tb1 a
    INNER JOIN 
        tb2 b on a.id=b.fid and a.col3 = b.col4
    INNER JOIN 
        tb3 c on b.fid = c.col5
    INNER JOIN       
        tb4 d on c.id = d.col6
    INNER JOIN 
        #Temp e on c.col7 = e.id
    GROUP BY
        b.id, d.Title, e.Class
    

    Actually, Temporary tables are tables that exist temporarily on the SQL Server. The temporary tables are useful for storing the immediate result sets that are accessed multiple times. You can read more about it here https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/ And here https://codingsight.com/introduction-to-temporary-tables-in-sql-server/

    Also I would strongly recommend, if you are using the Stored Procedure, set the NOCOUNT to ON, it can also provide a significant performance boost, because network traffic is greatly reduced:

    SET NOCOUNT ON
    SELECT *
    INTO #Temp
    -- The rest of code
    

    Based on this:

    SET NOCOUNT ON is a set statement which prevents the message which shows the number of rows affected by T-SQL query statements. This is used within stored procedures and triggers to avoid showing the affected rows message. Using SET NOCOUNT ON within a stored procedure can improve the performance of the stored procedure by a significant margin.

    0 讨论(0)
提交回复
热议问题