I have a question about Entity Framework query execution performance.
Schema:
I have a table structure like this:
CREATE TABLE [
I know I'm a bit late here, but since I've participated in the building of the query in question, I feel obliged to take some action.
The general problem I see with Linq to Entities queries is that the typical way we build them introduces unnecessary parameters, which may affect the cached database query plan (so called Sql Server parameter sniffing problem).
Let take a look at your query group by expression
d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval)
Since minuteInterval
is a variable (i.e. non constant), it introduces a parameter. Same for DateTime.MinValue
(note that the primitive types expose similar things as constants, but for DateTime
, decimal
etc. they are static readonly fields which makes a big diference how they are treated inside the expressions).
But regardless of how it's represented in the CLR system, DateTime.MinValue
logically is a constant. What about minuteInterval
, it depends on your usage.
My attempt to solve the issue would be to eliminate all the parameters related to that expression. Since we cannot do that with compiler generated expression, we need to build it manually using System.Linq.Expressions
. The later is not intuitive, but fortunately we can use a hybrid approach.
First, we need a helper method which allows us to replace expression parameters:
public static class ExpressionUtils
{
public static Expression ReplaceParemeter(this Expression expression, ParameterExpression source, Expression target)
{
return new ParameterReplacer { Source = source, Target = target }.Visit(expression);
}
class ParameterReplacer : ExpressionVisitor
{
public ParameterExpression Source;
public Expression Target;
protected override Expression VisitParameter(ParameterExpression node)
{
return node == Source ? Target : base.VisitParameter(node);
}
}
}
Now we have everything needed. Let encapsulate the logic inside a custom method:
public static class QueryableUtils
{
public static IQueryable> GroupBy(this IQueryable source, Expression> dateSelector, int minuteInterval)
{
Expression> expr = (date, baseDate, interval) =>
DbFunctions.AddMinutes(baseDate, DbFunctions.DiffMinutes(baseDate, date) / interval).Value;
var selector = Expression.Lambda>(
expr.Body
.ReplaceParemeter(expr.Parameters[0], dateSelector.Body)
.ReplaceParemeter(expr.Parameters[1], Expression.Constant(DateTime.MinValue))
.ReplaceParemeter(expr.Parameters[2], Expression.Constant(minuteInterval))
, dateSelector.Parameters[0]
);
return source.GroupBy(selector);
}
}
Finally, replace
.GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))
with
.GroupBy(d => d.TimeStamp, minuteInterval * minuteInterval)
and the generated SQL query would be like this (for minuteInterval = 15
):
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3],
[GroupBy1].[A2] AS [C4],
[GroupBy1].[A3] AS [C5],
[GroupBy1].[A4] AS [C6]
FROM ( SELECT
[Project1].[K1] AS [K1],
MIN([Project1].[A1]) AS [A1],
MAX([Project1].[A2]) AS [A2],
AVG([Project1].[A3]) AS [A3],
STDEVP([Project1].[A4]) AS [A4]
FROM ( SELECT
DATEADD (minute, (DATEDIFF (minute, convert(datetime2, '0001-01-01 00:00:00.0000000', 121), [Project1].[TimeStamp])) / 225, convert(datetime2, '0001-01-01 00:00:00.0000000', 121)) AS [K1],
[Project1].[C1] AS [A1],
[Project1].[C1] AS [A2],
[Project1].[C1] AS [A3],
[Project1].[C1] AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringDatas] AS [Extent1]
INNER JOIN [dbo].[DCStrings] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBoxes] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLoggers] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
) AS [Project1]
) AS [Project1]
GROUP BY [K1]
) AS [GroupBy1]
As you may see, we successfully eliminated some of the query parameters. Will that help? Well, as with any database query tuning, it might or might not. You need to try and see.