Set based plan runs slower than scalar valued function with many conditions

前端 未结 1 572
一向
一向 2020-12-19 19:50

This question is a bit more of a hypothetical than an actual code problem. But I have provided a dumbed down version of the code to illustrate the question. Please no comme

相关标签:
1条回答
  • 2020-12-19 20:47

    The keyword term here is INLINE TABLE VALUED FUNCTIONS. You have two types of T-SQL tabled valued functions: multi-statement and inline. If your T-SQL function starts with a BEGIN statement then it's going to be crap - scalar or otherwise. You can't get a temp table into an inline table valued function so I'm assuming you went from scalar to mutli-statement table valued function which will probably be worse.

    Your inline table valued function (iTVF) should look something like this:

    CREATE FUNCTION [dbo].[Compute_value]
    (
      @alpha FLOAT,
      @bravo FLOAT,
      @charle FLOAT,
      @delta FLOAT
    )
    RETURNS TABLE WITH SCHEMABINDING AS RETURN
    SELECT newValue = 
      CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
           WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
           ELSE @alpha * POWER((100 / @delta), 
                 (-2 * POWER(@charle * @bravo, DATEDIFF(<unit of measurement>,GETDATE(),'1/1/2000')/365)))
      END
    GO;
    

    Note that, in the code you posted, your DATEDIFF statement is missing the datepart parameter. If should look something like:

    @x int = DATEDIFF(DAY, GETDATE(),'1/1/2000')   
    

    Going a little further - it's important to understand why iTVF's are better than T-SQL scalar valued user-defined functions. It's not because table valued functions are faster than scalar valued functions, it's because Microsoft's implementation of T-SQL inline functions are faster than their implementation of T-SQL functions that are not inline. Note the following three functions that do the same thing:

    -- Scalar version
    CREATE FUNCTION dbo.Compute_value_scalar
    (
      @alpha FLOAT,
      @bravo FLOAT,
      @charle FLOAT,
      @delta FLOAT
    )
    RETURNS FLOAT
    AS
    BEGIN
        IF @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 
        RETURN 0
    
        IF @bravo IS NULL OR @bravo <= 0
            RETURN 100
    
        IF (@charle + @delta) / @bravo <= 0
            RETURN 100
        DECLARE @x int = DATEDIFF(dd, GETDATE(),'1/1/2000')     
        RETURN @alpha * POWER((100 / @delta), (-2 * POWER(@charle * @bravo, @x/365)))
    END
    GO
    
    -- multi-statement table valued function 
    CREATE FUNCTION dbo.Compute_value_mtvf
    (
      @alpha FLOAT,
      @bravo FLOAT,
      @charle FLOAT,
      @delta FLOAT
    )
    RETURNS  @sometable TABLE (newValue float) AS 
        BEGIN
        INSERT @sometable VALUES
    (
      CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
           WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
           ELSE @alpha * POWER((100 / @delta), 
                 (-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
      END
    )
    RETURN;
    END
    GO
    
    -- INLINE table valued function
    CREATE FUNCTION dbo.Compute_value_itvf
    (
      @alpha FLOAT,
      @bravo FLOAT,
      @charle FLOAT,
      @delta FLOAT
    )
    RETURNS TABLE WITH SCHEMABINDING AS RETURN
    SELECT newValue = 
      CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
           WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
           ELSE @alpha * POWER((100 / @delta), 
                 (-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
      END
    GO
    

    Now for some sample data and performance test:

    SET NOCOUNT ON;
    CREATE TABLE #someTable (alpha FLOAT, bravo FLOAT, charle FLOAT, delta FLOAT);
    INSERT #someTable
    SELECT TOP (100000)
      abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1, 
      abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
    FROM sys.all_columns a, sys.all_columns b;
    
    PRINT char(10)+char(13)+'scalar'+char(10)+char(13)+replicate('-',60);
    GO
    DECLARE @st datetime = getdate(), @z float;
    
    SELECT @z = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
    FROM #someTable t;
    
    PRINT DATEDIFF(ms, @st, getdate());
    GO
    
    PRINT char(10)+char(13)+'mtvf'+char(10)+char(13)+replicate('-',60);
    GO
    DECLARE @st datetime = getdate(), @z float;
    
    SELECT @z = f.newValue
    FROM #someTable t
    CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f;
    
    PRINT DATEDIFF(ms, @st, getdate());
    GO
    
    PRINT char(10)+char(13)+'itvf'+char(10)+char(13)+replicate('-',60);
    GO
    DECLARE @st datetime = getdate(), @z float;
    
    SELECT @z = f.newValue
    FROM #someTable t
    CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f;
    
    PRINT DATEDIFF(ms, @st, getdate());
    GO
    

    Results:

    scalar
    ------------------------------------------------------------
    2786
    
    mTVF
    ------------------------------------------------------------
    41536
    
    iTVF
    ------------------------------------------------------------
    153
    

    The scalar udf ran for 2.7 seconds, 41 seconds for the mtvf and 0.153 seconds for the iTVF. To understand why let's look at the estimated execution plans:

    You don't see this when you look at the actual execution plan but, with the scalar udf and mtvf, the optimizer calls some poorly executed subroutine for each row; the iTVF does not. Quoting Paul White's career changing article about APPLY Paul writes:

    You might find it useful to think of an iTVF as a view that accepts parameters. Just as for views, SQL Server expands the definition of an iTVF directly into the query plan of an enclosing query, before optimization is performed.

    The effect is that SQL Server is able to apply its full range of optimizations, considering the query as a whole. It is just as if you had written the expanded query out by hand....

    In other words, iTVF's enable to optimizer to optimize the query in ways that just aren't possible when all that other code needs to be executed. One of many other examples of why iTVFs are superior is they are the only one of the three aforementioned function types that allow parallelism. Let's run each function one more time, this time with the Actual Execution plan turned on and with traceflag 8649 (which forces a parallel execution plan):

    -- don't need so many rows for this test
    TRUNCATE TABLE #sometable;
    INSERT #someTable 
    SELECT TOP (10)
      abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1, 
      abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
    FROM sys.all_columns a;
    
    DECLARE @x float;
    
    SELECT TOP (10) @x = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
    FROM #someTable t
    ORDER BY dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
    OPTION (QUERYTRACEON 8649);
    
    SELECT TOP (10)  @x = f.newValue
    FROM #someTable t
    CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f
    ORDER BY f.newValue
    OPTION (QUERYTRACEON 8649);
    
    SELECT @x = f.newValue
    FROM #someTable t
    CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f
    ORDER BY f.newValue
    OPTION (QUERYTRACEON 8649);
    

    Execution plans:

    Those arrows you see for the iTVF's execution plan is parallelism - all your CPU's (or as many as your SQL instance's MAXDOP settings allow) working together. T-SQL scalar and mtvf UDFs can't do that. When Microsoft introduces inline scalar UDFs then I'd suggest those for what you're doing but, until then: if performance is what you're looking for then inline is the only way to go and, for that, iTVFs are the only game in town.

    Note that I continuously emphasized T-SQL when talking about functions... CLR Scalar and Table valued functions can be just fine but that's a different topic.

    0 讨论(0)
提交回复
热议问题