Entity Framework query performance differs extrem with raw SQL execution

后端 未结 4 1692
生来不讨喜
生来不讨喜 2021-01-31 04:51

I have a question about Entity Framework query execution performance.

Schema:

I have a table structure like this:

CREATE TABLE [         


        
4条回答
  •  走了就别回头了
    2021-01-31 05:12

    I know I'm a bit late here, but since I've participated in the building of the query in question, I feel obliged to take some action.

    The general problem I see with Linq to Entities queries is that the typical way we build them introduces unnecessary parameters, which may affect the cached database query plan (so called Sql Server parameter sniffing problem).

    Let take a look at your query group by expression

    d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval)
    

    Since minuteInterval is a variable (i.e. non constant), it introduces a parameter. Same for DateTime.MinValue (note that the primitive types expose similar things as constants, but for DateTime, decimal etc. they are static readonly fields which makes a big diference how they are treated inside the expressions).

    But regardless of how it's represented in the CLR system, DateTime.MinValue logically is a constant. What about minuteInterval, it depends on your usage.

    My attempt to solve the issue would be to eliminate all the parameters related to that expression. Since we cannot do that with compiler generated expression, we need to build it manually using System.Linq.Expressions. The later is not intuitive, but fortunately we can use a hybrid approach.

    First, we need a helper method which allows us to replace expression parameters:

    public static class ExpressionUtils
    {
        public static Expression ReplaceParemeter(this Expression expression, ParameterExpression source, Expression target)
        {
            return new ParameterReplacer { Source = source, Target = target }.Visit(expression);
        }
    
        class ParameterReplacer : ExpressionVisitor
        {
            public ParameterExpression Source;
            public Expression Target;
            protected override Expression VisitParameter(ParameterExpression node)
            {
                return node == Source ? Target : base.VisitParameter(node);
            }
        }
    }
    

    Now we have everything needed. Let encapsulate the logic inside a custom method:

    public static class QueryableUtils
    {
        public static IQueryable> GroupBy(this IQueryable source, Expression> dateSelector, int minuteInterval)
        {
            Expression> expr = (date, baseDate, interval) =>
                DbFunctions.AddMinutes(baseDate, DbFunctions.DiffMinutes(baseDate, date) / interval).Value;
            var selector = Expression.Lambda>(
                expr.Body
                .ReplaceParemeter(expr.Parameters[0], dateSelector.Body)
                .ReplaceParemeter(expr.Parameters[1], Expression.Constant(DateTime.MinValue))
                .ReplaceParemeter(expr.Parameters[2], Expression.Constant(minuteInterval))
                , dateSelector.Parameters[0]
            );
            return source.GroupBy(selector);
        }
    }
    

    Finally, replace

    .GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))
    

    with

    .GroupBy(d => d.TimeStamp, minuteInterval * minuteInterval)
    

    and the generated SQL query would be like this (for minuteInterval = 15):

    SELECT 
        1 AS [C1], 
        [GroupBy1].[K1] AS [C2], 
        [GroupBy1].[A1] AS [C3], 
        [GroupBy1].[A2] AS [C4], 
        [GroupBy1].[A3] AS [C5], 
        [GroupBy1].[A4] AS [C6]
        FROM ( SELECT 
            [Project1].[K1] AS [K1], 
            MIN([Project1].[A1]) AS [A1], 
            MAX([Project1].[A2]) AS [A2], 
            AVG([Project1].[A3]) AS [A3], 
            STDEVP([Project1].[A4]) AS [A4]
            FROM ( SELECT 
                DATEADD (minute, (DATEDIFF (minute, convert(datetime2, '0001-01-01 00:00:00.0000000', 121), [Project1].[TimeStamp])) / 225, convert(datetime2, '0001-01-01 00:00:00.0000000', 121)) AS [K1], 
                [Project1].[C1] AS [A1], 
                [Project1].[C1] AS [A2], 
                [Project1].[C1] AS [A3], 
                [Project1].[C1] AS [A4]
                FROM ( SELECT 
                    [Extent1].[TimeStamp] AS [TimeStamp], 
                    [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                    FROM    [dbo].[StringDatas] AS [Extent1]
                    INNER JOIN [dbo].[DCStrings] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                    INNER JOIN [dbo].[DCDistributionBoxes] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                    INNER JOIN [dbo].[DataLoggers] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                    WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
                )  AS [Project1]
            )  AS [Project1]
            GROUP BY [K1]
        )  AS [GroupBy1]
    

    As you may see, we successfully eliminated some of the query parameters. Will that help? Well, as with any database query tuning, it might or might not. You need to try and see.

提交回复
热议问题