Big performance difference (1hr to 1 minute ) found in SQL. Can you explain why?

ぃ、小莉子 提交于 2019-12-12 02:35:53

问题


The following queries are taking 70 minutes and 1 minute respectively on a standard machine for 1 million records. What could be the possible reasons?

Query [01:10:00]

SELECT * 
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
    CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')        
        THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')        
        ELSE sys.fn_cdc_increment_lsn(0x00) END
    , sys.fn_cdc_get_max_lsn()
    , 'all with mask') 
WHERE __$operation <> 1

Modified Query [00:01:10]

DECLARE @MinLSN binary(10)
DECLARE @MaxLSN binary(10)
SELECT @MaxLSN= sys.fn_cdc_get_max_lsn()
SELECT @MinLSN=CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')     
        THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')        
        ELSE sys.fn_cdc_increment_lsn(0x00) END

SELECT * 
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
        @MinLSN, @MaxLSN, 'all with mask') WHERE __$operation <> 1

[Modified]

I tried to recreate the scenario with a similar function to see if the parameters are evaluated for each row.

CREATE FUNCTION Fn_Test(@a decimal)RETURNS TABLE
AS
RETURN
(
    SELECT @a Parameter, Getdate() Dt, PartitionTest.*
    FROM PartitionTest
);

SELECT * FROM Fn_Test(RAND(DATEPART(s,GETDATE())))

But I am getting the same value for the column 'Parameter' for a a million records processed in 38 seconds.


回答1:


Even deterministic scalar functions are evaluated at least once per row. If the same deterministic scalar function occurs multiple times on the same "row" with the same parameters, I believe only then will it be evaluated once - e.g. in a CASE WHEN fn_X(a, b, c) > 0 THEN fn_X(a, b, c) ELSE 0 END or something like that.

I think your RAND problem is because you continue to reseed:

Repetitive calls of RAND() with the same seed value return the same results.

For one connection, if RAND() is called with a specified seed value, all subsequent calls of RAND() produce results based on the seeded RAND() call. For example, the following query will always return the same sequence of numbers.

I have taken to caching scalar function results as you have indicated - even going so far as to precalculate tables of scalar function results and joining to them. Something has to be done eventually to make scalar functions perform. Right not, the best option is the CLR - apparently these far outperform SQL UDFs. Unfortunately, I cannot use them in my current environment.




回答2:


In your first query, your fn_cdc_increment_lsn and fn_cdc_get_min_lsn get executed for every row. In second example, just once.



来源:https://stackoverflow.com/questions/1819513/big-performance-difference-1hr-to-1-minute-found-in-sql-can-you-explain-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!