What can I do to improve performance of my pure User Defined Function in SQL Server?

问题

I have made a simple, but relatively computationally complex, UDF that queries a rarely changing table. In typical usage this function is called many many times from a WHERE clauses over a very small domain of parameters.

What can I do to make my usage of the UDF faster? My thoughts are that there should be some way to tell SQL Server that my function returns the same result with the same parameters and thus should be memoized. There doesn't seem a way to do it within the UDF because they are required to be pure and thus can't write to a temp table.

For completeness my UDF is below, though I am seeking a general answer on how to make calling UDFs on small domains faster, and not how to optimize this particular UDF.

CREATE function [dbo].[WorkDay] (
    @inputDate datetime, 
    @offset int) 
returns datetime as begin

declare 
     @result datetime 

set @result = @inputDate

while @offset != 0
begin
    set @result = dateadd( day, sign(@offset), @result )

    while ( DATEPART(weekday, @result ) not between 2 and 6 )
      or @result in (select date from myDB.dbo.holidays
      where calendar = 'US' and date = @result)
    begin
        set @result = dateadd( day, sign(@offset), @result )
    end
    set @offset = @offset - sign(@offset)
end
return @result

END

回答1:

My first thought here is -- what's the performance problem? Sure you have a loop (once per row to apply where) within a loop that it runs a query. But are you getting poor execution plans? Are your result sets huge? But lets turn to the generic. How does once solve this problem? SQL doesn't really do memoization(as the illustrious @Martin_Smith points out). So what's a boy to do?

Option 1 - New Design

Create an entirely new design. In this specific case @Aaron_Bertrand points out that a calendar table may meet your needs. Quite right. This doesn't really help you with non calendar situations, but as is often the case in SQL you need to think a bit different.

Option 2 - Call the UDF Less

Narrow the set of items that call this function. This reminds me a lot of how to do successful paging/row counting. Generate a small result set that has the distinct values required and then call your UDF so it is only called a few times. This may or may not be an option, but can work in many scenarios.

Option 3 - Dynamic UDF

I'll probably get booed out of the room for this suggestion, but here goes. What makes this UDF slow is the select statement inside the loop. If your Holiday table really changes infrequently you could put a trigger on the table. The trigger would write out and updated UDF. The new UDF could brute force all the holiday decisions. Would it bit a bit like cannibalism with SQL writing SQL? Sure. But it would get rid of the sub-query and speed the UDF up. Let the heckling begin.

Option 4 - Memoize It!

While SQL can't directly memoize, we do have SQL CLR. Convert the UDF to a SQL CLR udf. In CLR you get to use static variables. You could easily grab the Holidays table at some regular interval and store them in a hashtable. Then just rewrite your loop in the CLR. You could even go further and memoize the entire answer if that's appropriate logic.

Update:

Option 1 - I was really trying to focus on the general here, not the example function you used above. However, the current design of your UDF allows for multiple calls to the Holiday table if you happen to hit a few in a row. Using some sort of calendar-style-table that contains a list of 'bad days' and the corresponding 'next business day' will allow you to remove the potential for multiple hits & queries.

Option 3 - While the domain is unknown ahead of time you could very well modify your holiday table. For a given holiday day it would contain the next corresponding work day. From this data you could spit out a UDF with a long case statement (when '5/5/2012' then '5/14/2012' or something similar) at the bottom. This strategy may not work for every type of problem, but could work well for some types of problems.

Option 4 - There are implications to every technology. CLR needs to be deployed, the SQL Server configuration modified and SQL CLR is limited to the 3.5 framework. Personally, I've found these adjustments easy enough, but your situation may be different (say a recalcitrant DBA, or restrictions on modifications to production servers).

Using static variables requires the assemblies be granted FULL TRUST. You'll have to make sure you get your locking correct.

There is some evidence that at very high transaction levels CLR doesn't perform as well as direct SQL. In your scenario, however, this observation might not be applicable because there isn't a direct SQL correlary for what your trying to do (memoize).

回答2:

You could write to a real table keyed off of your params and select for that first and if that comes up null, then calculate and insert into the table doing your own caching.

It might make more sense to pre-fill a table with all possible values for the date range you are interested in and then just join to that. you are then only doing the calculation once for each combination of params and letting SQL handle the join.

来源：https://stackoverflow.com/questions/9402931/what-can-i-do-to-improve-performance-of-my-pure-user-defined-function-in-sql-ser

标签

sql-server

sql-server-2008

sqlclr

user-defined-functions

memoization