Entity Framework query performance differs extrem with raw SQL execution

后端 未结 4 1693
生来不讨喜
生来不讨喜 2021-01-31 04:51

I have a question about Entity Framework query execution performance.

Schema:

I have a table structure like this:

CREATE TABLE [         


        
相关标签:
4条回答
  • 2021-01-31 05:04

    In this answer I'm focusing on the original observation: the query generated by EF is slow, but when the same query is run in SSMS it is fast.

    One possible explanation of this behaviour is Parameter sniffing.

    SQL Server uses a process called parameter sniffing when it executes stored procedures that have parameters. When the procedure is compiled or recompiled, the value passed into the parameter is evaluated and used to create an execution plan. That value is then stored with the execution plan in the plan cache. On subsequent executions, that same value – and same plan – is used.

    So, EF generates a query that has few parameters. The first time you run this query the server creates an execution plan for this query using values of parameters that were in effect in the first run. That plan is usually pretty good. But, later on you run the same EF query using other values for parameters. It is possible that for new values of parameters the previously generated plan is not optimal and the query becomes slow. The server keeps using the previous plan, because it is still the same query, just values of parameters are different.

    If at this moment you take the query text and try to run it directly in SSMS the server will create a new execution plan, because technically it is not the same query that is issued by EF application. Even one character difference is enough, any change in the session settings is also enough for the server to treat the query as a new one. As a result the server has two plans for the seemingly same query in its cache. The first "slow" plan is slow for the new values of parameters, because it was originally built for different parameter values. The second "fast" plan is built for the current parameter values, so it is fast.

    The article Slow in the Application, Fast in SSMS by Erland Sommarskog explains this and other related areas in much more details.

    There are several ways to discard cached plans and force the server to regenerate them. Changing the table or changing the table indexes should do it - it should discard all plans that are related to this table, both "slow" and "fast". Then you run the query in EF application with new values of parameters and get a new "fast" plan. You run the query in SSMS and get a second "fast" plan with new values of parameters. The server still generates two plans, but both plans are fast now.

    Another variant is adding OPTION(RECOMPILE) to the query. With this option the server would not store the generated plan in its cache. So, every time the query runs the server would use actual parameter values to generate the plan that (it thinks) would be optimal for the given parameter values. The downside is an added overhead of the plan generation.

    Mind you, the server still could choose a "bad" plan with this option due to outdated statistics, for example. But, at least, parameter sniffing would not be a problem.


    Those who wonder how to add OPTION (RECOMPILE) hint to the query that is generated by EF have a look at this answer:

    https://stackoverflow.com/a/26762756/4116017

    0 讨论(0)
  • 2021-01-31 05:12

    I know I'm a bit late here, but since I've participated in the building of the query in question, I feel obliged to take some action.

    The general problem I see with Linq to Entities queries is that the typical way we build them introduces unnecessary parameters, which may affect the cached database query plan (so called Sql Server parameter sniffing problem).

    Let take a look at your query group by expression

    d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval)
    

    Since minuteInterval is a variable (i.e. non constant), it introduces a parameter. Same for DateTime.MinValue (note that the primitive types expose similar things as constants, but for DateTime, decimal etc. they are static readonly fields which makes a big diference how they are treated inside the expressions).

    But regardless of how it's represented in the CLR system, DateTime.MinValue logically is a constant. What about minuteInterval, it depends on your usage.

    My attempt to solve the issue would be to eliminate all the parameters related to that expression. Since we cannot do that with compiler generated expression, we need to build it manually using System.Linq.Expressions. The later is not intuitive, but fortunately we can use a hybrid approach.

    First, we need a helper method which allows us to replace expression parameters:

    public static class ExpressionUtils
    {
        public static Expression ReplaceParemeter(this Expression expression, ParameterExpression source, Expression target)
        {
            return new ParameterReplacer { Source = source, Target = target }.Visit(expression);
        }
    
        class ParameterReplacer : ExpressionVisitor
        {
            public ParameterExpression Source;
            public Expression Target;
            protected override Expression VisitParameter(ParameterExpression node)
            {
                return node == Source ? Target : base.VisitParameter(node);
            }
        }
    }
    

    Now we have everything needed. Let encapsulate the logic inside a custom method:

    public static class QueryableUtils
    {
        public static IQueryable<IGrouping<DateTime, T>> GroupBy<T>(this IQueryable<T> source, Expression<Func<T, DateTime>> dateSelector, int minuteInterval)
        {
            Expression<Func<DateTime, DateTime, int, DateTime>> expr = (date, baseDate, interval) =>
                DbFunctions.AddMinutes(baseDate, DbFunctions.DiffMinutes(baseDate, date) / interval).Value;
            var selector = Expression.Lambda<Func<T, DateTime>>(
                expr.Body
                .ReplaceParemeter(expr.Parameters[0], dateSelector.Body)
                .ReplaceParemeter(expr.Parameters[1], Expression.Constant(DateTime.MinValue))
                .ReplaceParemeter(expr.Parameters[2], Expression.Constant(minuteInterval))
                , dateSelector.Parameters[0]
            );
            return source.GroupBy(selector);
        }
    }
    

    Finally, replace

    .GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))
    

    with

    .GroupBy(d => d.TimeStamp, minuteInterval * minuteInterval)
    

    and the generated SQL query would be like this (for minuteInterval = 15):

    SELECT 
        1 AS [C1], 
        [GroupBy1].[K1] AS [C2], 
        [GroupBy1].[A1] AS [C3], 
        [GroupBy1].[A2] AS [C4], 
        [GroupBy1].[A3] AS [C5], 
        [GroupBy1].[A4] AS [C6]
        FROM ( SELECT 
            [Project1].[K1] AS [K1], 
            MIN([Project1].[A1]) AS [A1], 
            MAX([Project1].[A2]) AS [A2], 
            AVG([Project1].[A3]) AS [A3], 
            STDEVP([Project1].[A4]) AS [A4]
            FROM ( SELECT 
                DATEADD (minute, (DATEDIFF (minute, convert(datetime2, '0001-01-01 00:00:00.0000000', 121), [Project1].[TimeStamp])) / 225, convert(datetime2, '0001-01-01 00:00:00.0000000', 121)) AS [K1], 
                [Project1].[C1] AS [A1], 
                [Project1].[C1] AS [A2], 
                [Project1].[C1] AS [A3], 
                [Project1].[C1] AS [A4]
                FROM ( SELECT 
                    [Extent1].[TimeStamp] AS [TimeStamp], 
                    [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                    FROM    [dbo].[StringDatas] AS [Extent1]
                    INNER JOIN [dbo].[DCStrings] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                    INNER JOIN [dbo].[DCDistributionBoxes] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                    INNER JOIN [dbo].[DataLoggers] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                    WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
                )  AS [Project1]
            )  AS [Project1]
            GROUP BY [K1]
        )  AS [GroupBy1]
    

    As you may see, we successfully eliminated some of the query parameters. Will that help? Well, as with any database query tuning, it might or might not. You need to try and see.

    0 讨论(0)
  • 2021-01-31 05:15

    The DB engine determines the plan for each query based on how it is called. In case of your EF Linq query, the plan is prepared in such a way that each input parameter is treated as an unknown(since you have no idea what's coming in). In your actual query, you have all you parameters as part of the query so it will run under a different plan than that for a parameterized one. One of the affected piece that I see immediately is

    ...(@p__linq__0 IS NULL)..

    This is FALSE since p_linq_0 = 20827 and is NOT NULL, so your first half of the WHERE is FALSE to begin with and does not need to be looked at any more. In case of LINQ queries, the DB has no idea what's coming in so evaluates everything anyway.

    You'll need to see if you can use indices or other techniques to make this run faster.

    0 讨论(0)
  • 2021-01-31 05:23

    When EF runs the query, it wraps it and runs it with sp_executesql, which means the execution plan will be cached in the stored procedure execution plan cache. Due to differences (parameter sniffing etc) in how the raw sql statement vs the SP version have their execution plans built, the two can differ.

    When running the EF (sp wrapped) version, SQL server is most likely using a more generic execution plan that covers a wider range of timestamps than the values you are actually passing in.

    That said, to reduce the chance of SQL server trying something "funny" with hash joins etc, the first things I would do are:

    1) Index the columns used in the where clause, and in joins

    create index ix_DataLogger_ProjectID on DataLogger (ProjectID);
    create index ix_DCDistributionBox_DataLoggerID on DCDistributionBox (DataLoggerID);
    create index ix_DCString_DCDistributionBoxID on DCString (DCDistributionBoxID);
    

    2) Do explicit joins in the Linq query to eliminate the or ProductID is null part

    0 讨论(0)
提交回复
热议问题