Set based plan runs slower than scalar valued function with many conditions

前端未结

关注

 1  572

This question is a bit more of a hypothetical than an actual code problem. But I have provided a dumbed down version of the code to illustrate the question. Please no comme

相关标签:

1条回答

我寻月下人不归

2020-12-19 20:47

The keyword term here is INLINE TABLE VALUED FUNCTIONS. You have two types of T-SQL tabled valued functions: multi-statement and inline. If your T-SQL function starts with a BEGIN statement then it's going to be crap - scalar or otherwise. You can't get a temp table into an inline table valued function so I'm assuming you went from scalar to mutli-statement table valued function which will probably be worse.

Your inline table valued function (iTVF) should look something like this:

CREATE FUNCTION [dbo].[Compute_value]
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newValue = 
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(<unit of measurement>,GETDATE(),'1/1/2000')/365)))
  END
GO;

Note that, in the code you posted, your DATEDIFF statement is missing the datepart parameter. If should look something like:

@x int = DATEDIFF(DAY, GETDATE(),'1/1/2000')

Going a little further - it's important to understand why iTVF's are better than T-SQL scalar valued user-defined functions. It's not because table valued functions are faster than scalar valued functions, it's because Microsoft's implementation of T-SQL inline functions are faster than their implementation of T-SQL functions that are not inline. Note the following three functions that do the same thing:

-- Scalar version
CREATE FUNCTION dbo.Compute_value_scalar
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS FLOAT
AS
BEGIN
    IF @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 
    RETURN 0

    IF @bravo IS NULL OR @bravo <= 0
        RETURN 100

    IF (@charle + @delta) / @bravo <= 0
        RETURN 100
    DECLARE @x int = DATEDIFF(dd, GETDATE(),'1/1/2000')     
    RETURN @alpha * POWER((100 / @delta), (-2 * POWER(@charle * @bravo, @x/365)))
END
GO

-- multi-statement table valued function 
CREATE FUNCTION dbo.Compute_value_mtvf
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS  @sometable TABLE (newValue float) AS 
    BEGIN
    INSERT @sometable VALUES
(
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
  END
)
RETURN;
END
GO

-- INLINE table valued function
CREATE FUNCTION dbo.Compute_value_itvf
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newValue = 
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
  END
GO

Now for some sample data and performance test:

SET NOCOUNT ON;
CREATE TABLE #someTable (alpha FLOAT, bravo FLOAT, charle FLOAT, delta FLOAT);
INSERT #someTable
SELECT TOP (100000)
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1, 
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
FROM sys.all_columns a, sys.all_columns b;

PRINT char(10)+char(13)+'scalar'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
FROM #someTable t;

PRINT DATEDIFF(ms, @st, getdate());
GO

PRINT char(10)+char(13)+'mtvf'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f;

PRINT DATEDIFF(ms, @st, getdate());
GO

PRINT char(10)+char(13)+'itvf'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f;

PRINT DATEDIFF(ms, @st, getdate());
GO

Results:

scalar
------------------------------------------------------------
2786

mTVF
------------------------------------------------------------
41536

iTVF
------------------------------------------------------------
153

The scalar udf ran for 2.7 seconds, 41 seconds for the mtvf and 0.153 seconds for the iTVF. To understand why let's look at the estimated execution plans:

You don't see this when you look at the actual execution plan but, with the scalar udf and mtvf, the optimizer calls some poorly executed subroutine for each row; the iTVF does not. Quoting Paul White's career changing article about APPLY Paul writes:

You might find it useful to think of an iTVF as a view that accepts parameters. Just as for views, SQL Server expands the definition of an iTVF directly into the query plan of an enclosing query, before optimization is performed.

The effect is that SQL Server is able to apply its full range of optimizations, considering the query as a whole. It is just as if you had written the expanded query out by hand....

In other words, iTVF's enable to optimizer to optimize the query in ways that just aren't possible when all that other code needs to be executed. One of many other examples of why iTVFs are superior is they are the only one of the three aforementioned function types that allow parallelism. Let's run each function one more time, this time with the Actual Execution plan turned on and with traceflag 8649 (which forces a parallel execution plan):

-- don't need so many rows for this test
TRUNCATE TABLE #sometable;
INSERT #someTable 
SELECT TOP (10)
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1, 
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
FROM sys.all_columns a;

DECLARE @x float;

SELECT TOP (10) @x = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
FROM #someTable t
ORDER BY dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
OPTION (QUERYTRACEON 8649);

SELECT TOP (10)  @x = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f
ORDER BY f.newValue
OPTION (QUERYTRACEON 8649);

SELECT @x = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f
ORDER BY f.newValue
OPTION (QUERYTRACEON 8649);

Execution plans:

Those arrows you see for the iTVF's execution plan is parallelism - all your CPU's (or as many as your SQL instance's MAXDOP settings allow) working together. T-SQL scalar and mtvf UDFs can't do that. When Microsoft introduces inline scalar UDFs then I'd suggest those for what you're doing but, until then: if performance is what you're looking for then inline is the only way to go and, for that, iTVFs are the only game in town.

Note that I continuously emphasized T-SQL when talking about functions... CLR Scalar and Table valued functions can be just fine but that's a different topic.

0 讨论(0)