Why is a UDF so much slower than a subquery?

前端 未结 4 1425
臣服心动
臣服心动 2020-11-28 12:03

I have a case where I need to translate (lookup) several values from the same table. The first way I wrote it, was using subqueries:

SELECT
    (SELECT id FRO         


        
相关标签:
4条回答
  • 2020-11-28 12:37

    To get the same result (NULL if user is deleted or not active).

     select 
        u1.id as creator,
        u2.id as updater,
        u3.id as owner,
        [a.name]
     FROM asset a
            LEFT JOIN user u1 ON (u1.user_pk = a.created_by AND u1.active=1) 
            LEFT JOIN user u2 ON (u2.user_pk = a.created_by AND u2.active=1) 
            LEFT JOIN user u3 ON (u3.user_pk = a.created_by AND u3.active=1) 
    
    0 讨论(0)
  • 2020-11-28 12:52

    Am I missing something? Why can't this work? You are only selecting the id which you already have in the table:

    select created_by as creator, updated_by as updater, 
    owned_by as owner, [name]
    from asset
    

    By the way, in designing you really should avoid keywords, like name, as field names.

    0 讨论(0)
  • 2020-11-28 12:55

    As other posters have suggested, using joins will definitely give you the best overall performance.

    However, since you've stated that that you don't want the headache of maintaining 50-ish similar joins or subqueries, try using an inline table-valued function as follows:

    CREATE FUNCTION dbo.get_user_inline (@user_pk INT)
    RETURNS TABLE AS
    RETURN
    (
        SELECT TOP 1 id
        FROM ice.dbo.[user]
        WHERE user_pk = @user_pk
            -- AND active = 1
    )
    

    Your original query would then become something like:

    SELECT
        (SELECT TOP 1 id FROM dbo.get_user_inline(created_by)) AS creator,
        (SELECT TOP 1 id FROM dbo.get_user_inline(updated_by)) AS updater,
        (SELECT TOP 1 id FROM dbo.get_user_inline(owned_by)) AS owner,
        [name]
    FROM asset
    

    An inline table-valued function should have better performance than either a scalar function or a multistatement table-valued function.

    The performance should be roughly equivalent to your original query, but any future changes can be made in the UDF, making it much more maintainable.

    0 讨论(0)
  • 2020-11-28 12:58

    The UDF is a black box to the query optimiser so it's executed for every row. You are doing a row-by-row cursor. For each row in an asset, look up an id three times in another table. This happens when you use scalar or multi-statement UDFs (In-line UDFs are simply macros that expand into the outer query)

    One of many articles on the problem is "Scalar functions, inlining, and performance: An entertaining title for a boring post".

    The sub-queries can be optimised to correlate and avoid the row-by-row operations.

    What you really want is this:

    SELECT
       uc.id AS creator,
       uu.id AS updater,
       uo.id AS owner,
       a.[name]
    FROM
        asset a
        JOIN
        user uc ON uc.user_pk = a.created_by
        JOIN
        user uu ON uu.user_pk = a.updated_by
        JOIN
        user uo ON uo.user_pk = a.owned_by
    

    Update Feb 2019

    SQL Server 2019 starts to fix this problem.

    0 讨论(0)
提交回复
热议问题