performance difference between User Defined Function and Stored Procedures

后端未结

关注

 6  963

If a statement return rows doing a simple select over the data base, is there performance difference between implement it using Function and Procedures? I know it is prefera

Not all UDFs are bad for performance.

There is a popular misconception that UDFs have adverse effect on performance. As a blanket statement, this is simply not true. In fact, inline table-valued UDFs are actually macros – the optimizer is very well capable rewriting queries involving them as well as optimizing them. However, scalar UDFs are usually very slow. I will provide a short example.

Prerequisites

Here is the script to create and populate the tables:

CREATE TABLE States(Code CHAR(2), [Name] VARCHAR(40), CONSTRAINT PK_States PRIMARY KEY(Code))
GO
INSERT States(Code, [Name]) VALUES('IL', 'Illinois')
INSERT States(Code, [Name]) VALUES('WI', 'Wisconsin')
INSERT States(Code, [Name]) VALUES('IA', 'Iowa')
INSERT States(Code, [Name]) VALUES('IN', 'Indiana')
INSERT States(Code, [Name]) VALUES('MI', 'Michigan')
GO
CREATE TABLE Observations(ID INT NOT NULL, StateCode CHAR(2), CONSTRAINT PK_Observations PRIMARY KEY(ID))
GO
SET NOCOUNT ON
DECLARE @i INT
SET @i=0
WHILE @i<100000 BEGIN
  SET @i = @i + 1
  INSERT Observations(ID, StateCode)
  SELECT @i, CASE WHEN @i % 5 = 0 THEN 'IL'
    WHEN @i % 5 = 1 THEN 'IA'
    WHEN @i % 5 = 2 THEN 'WI'
    WHEN @i % 5 = 3 THEN 'IA'
    WHEN @i % 5 = 4 THEN 'MI'
    END
END
GO

When a query involving a UDF is rewritten as an outer join.

Consider the following query:

SELECT o.ID, s.[name] AS StateName
  INTO dbo.ObservationsWithStateNames_Join
  FROM dbo.Observations o LEFT OUTER JOIN dbo.States s ON o.StateCode = s.Code

/*
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 1 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Observations'. Scan count 1, logical reads 188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'States'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 187 ms,  elapsed time = 188 ms.
*/

And compare it to a query involving an inline table valued UDF:

CREATE FUNCTION dbo.GetStateName_Inline(@StateCode CHAR(2))
RETURNS TABLE
AS
RETURN(SELECT [Name] FROM dbo.States WHERE Code = @StateCode);
GO
SELECT ID, (SELECT [name] FROM dbo.GetStateName_Inline(StateCode)) AS StateName
  INTO dbo.ObservationsWithStateNames_Inline
  FROM dbo.Observations

Both its execution plan and its execution costs are the same – the optimizer has rewritten it as an outer join. Don’t underestimate the power of the optimizer!

A query involving a scalar UDF is much slower.

Here is a scalar UDF:

CREATE FUNCTION dbo.GetStateName(@StateCode CHAR(2))
RETURNS VARCHAR(40)
AS
BEGIN
  DECLARE @ret VARCHAR(40)
  SET @ret = (SELECT [Name] FROM dbo.States WHERE Code = @StateCode)
  RETURN @ret
END
GO

Clearly the query using this UDF provides the same results but it has a different execution plan and it is dramatically slower:

/*
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 3 ms.
Table 'Worktable'. Scan count 1, logical reads 202930, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Observations'. Scan count 1, logical reads 188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 11890 ms,  elapsed time = 38585 ms.
*/

As you have seen, the optimizer can rewrite and optimize queries involving inline table valued UDFs. On the other hand, queries involving scalar UDFs are not rewritten by the optimizer – the execution of the last query includes one function call per row, which is very slow.

Not all UDFs are bad for performance.

0 讨论(0)

感情败类

2021-01-08 00:47

I think you should be less concerned about speed than about how you want to use this function. A UDF can appear elsewhere in a select statement and/or even be used as the "table" to join etc. You can't "select" from a stored procedure or join on one either.

However, UDFs are called for EVERY ROW so I'd be careful where you use it. This got me in real perf trouble once. So much that I'll never forget.

0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2021-01-08 00:48

There is no difference in speed between a query run inside a function and one run inside a procedure.

Stored procedures have problems aggregating results, they cannot be composed with other stored procedures. The onyl solution is really cumbersome as it involves catching the procedure output into a table with INSERT ... EXEC ... and then using the resulted table.

Functions have the advantage of being highly composable as a table value function can be placed anywhere a table expression is expected (FROM, JOIN, APPLY, IN etc). But functions have some very severe limitations in terms of what is permitted in a function and what is not, exactly because they can appear anywhere in a query.

So is really apple to oranges. The decision is not driven be performance, but by requirements. As a general rule anything that returns a dataset should be a view or a table valued function. Anything that manipulates data must be a procedure.

0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2021-01-08 00:52

Simple SELECT statements will be most affected by any indexes on the tables you are querying.

The Optimiser sits at the core of your chosen database engine and is responsible for making the important decisions about how the query passed in is run.

When writing queries it is worth spending time learning about indexes, optimisers, primary keys and so on. Picking a few database engines; SQL Server is different to mySQL and Oracle is different to both of them. There are many more and each is different in some way.

Stored Procedures can be fast, very fast, as they are pre-compiled. The optimiser does not have to work out the execution plan each time. A Stored Procedure will return results in a table form.

Functions can be Scalar (returning a single result) or return Tabular data.

It is quite possible to write inefficient Functions and Stored Procedures. The important thing to ask yourself is whether you need that functionality and how you will maintain it.

If you don't already own a book by Joe Celko then now might be the time to invest.

0 讨论(0)
发布评论:

提交评论
- 加载中...
时光说笑

2021-01-08 00:52

The first time I tried to use an Inline Table Valued Function (TVF), it actually took 66 to 76% (1.147 to 1.2 vs. 0.683 secs.) longer (vs. a Stored Procedure (SP))!?! That was the average of 100 iterations with 89 Rows per iteration. My SP was just doing the standard set nocount on followed by a complex (but still single) select Statement (with 5 inner join's and 2 outer join's (with one of the inner join's having an on Expression with an embedded select (which itself had a where Expression (with an embedded select + inner join))) and a group by and order by with 5 Columns and a count). The Caller is an insert into a Temp Table (with an identity Column but no Keys or Indexes) - Statement. The Inline TVF took 66% longer even without the order by the SP was doing. When I added it back in (to the select calling the Inline TVF since you can't have order by in an Inline TVF), it took even longer (76%)!?!

0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2021-01-08 01:00

As soon as SQL sees a BEGIN or END, the system cannot simplify the contents out.

So really the difference just comes down to thd fact that the results of a function can be used in an outer query, for joins, ignoring some columns and so on.

Your best bet is actually to use either a view or an inline table-valued function, so that SQL can simplify it down and only do the part that you're interested in. Look for my post on "Dangers of BEGIN and END" at my blog for more info.

0 讨论(0)
发布评论:

提交评论
- 加载中...