LINQ generating SQL with duplicate nested selects

前端 未结 2 2042
情深已故
情深已故 2021-02-07 09:36

I\'m very new to the .NET Entity Framework, and I think it\'s awesome, but somehow I\'m getting this strange issue (sorry for the spanish but my program is in that language, an

2条回答
  •  情深已故
    2021-02-07 10:21

    To be clear, LINQ to Entities does not generate the SQL. Instead, it generates an ADO.NET canonical command tree, and the ADO.NET provider for your database, presumably SQL Server in this case, generates the SQL.

    So why does it generate this derived table (I think "derived table" is the more correct term for the SQL feature in use here)? Because the code which generates the SQL has to generate SQL for a wide variety of LINQ queries, most of which are not nearly as trivial as the one you show. These queries will often be selecting data for multiple types (many of which might be anonymous, rather than named types), and in order to keep the SQL generation relatively sane, they are grouped into extents for each type.

    Another question: Why should you care? It's easy to demonstrate that the use of the derived table in this statement is "free" from a performance point of view.

    I selected a table at random from a populated database, and run the following query:

    SELECT [AddressId]
          ,[Address1]
          ,[Address2]
          ,[City]
          ,[State]
          ,[ZIP]
          ,[ZIPExtension]
      FROM [VertexRM].[dbo].[Address]
    

    Let's look at the cost:

    
      
      
        
    

    Now let's compare that to the query with the derived table:

    SELECT 
           [Extent1].[AddressId]
          ,[Extent1].[Address1]
          ,[Extent1].[Address2]
          ,[Extent1].[City]
          ,[Extent1].[State]
          ,[Extent1].[ZIP]
          ,[Extent1].[ZIPExtension]
      FROM (SELECT [AddressId]
              ,[Address1]
              ,[Address2]
              ,[City]
              ,[State]
              ,[ZIP]
              ,[ZIPExtension]
      FROM[VertexRM].[dbo].[Address]) AS [Extent1]
    

    And the cost:

    
      
      
        
    

    In both cases, SQL Server simply scans the clustered index. Not surprisingly, the cost is almost precisely the same.

    Let's take a look at a slightly more complicated query. I fired up LINQPad, and entered the following query against the same table, plus one related table:

    from a in Addresses
    select new
    {
        Id = a.Id,
        Address1 = a.Address1,
        Address2 = a.Address2,
        City = a.City,
        State = a.State,
        ZIP = a.ZIP,
        ZIPExtension = a.ZIPExtension,
        PersonCount = a.EntityAddresses.Count()
    }
    

    This generates the following SQL:

    SELECT 
    1 AS [C1], 
    [Project1].[AddressId] AS [AddressId], 
    [Project1].[Address1] AS [Address1], 
    [Project1].[Address2] AS [Address2], 
    [Project1].[City] AS [City], 
    [Project1].[State] AS [State], 
    [Project1].[ZIP] AS [ZIP], 
    [Project1].[ZIPExtension] AS [ZIPExtension], 
    [Project1].[C1] AS [C2]
    FROM ( SELECT 
        [Extent1].[AddressId] AS [AddressId], 
        [Extent1].[Address1] AS [Address1], 
        [Extent1].[Address2] AS [Address2], 
        [Extent1].[City] AS [City], 
        [Extent1].[State] AS [State], 
        [Extent1].[ZIP] AS [ZIP], 
        [Extent1].[ZIPExtension] AS [ZIPExtension], 
        (SELECT 
            COUNT(cast(1 as bit)) AS [A1]
            FROM [dbo].[EntityAddress] AS [Extent2]
            WHERE [Extent1].[AddressId] = [Extent2].[AddressId]) AS [C1]
        FROM [dbo].[Address] AS [Extent1]
    )  AS [Project1]
    

    Analyzing this, we can see that Project1 is the projection onto the anonymous type. Extent1 is the Address table/entity. And Extent2 is the table for the association. Now there is no derived table for Address, but there is one for the projection.

    I don't know if you have ever written a SQL generation system, but it isn't easy. I believe that the general problem of proving that a LINQ to Entities query and a SQL query are equivalent is NP-hard, although certain specific cases are obviously much easier. SQL is intentionally Turing-incomplete, because its designers wanted all SQL queries to execute in bounded time. LINQ, not so.

    In short, this is a very difficult problem to solve, and the combination of the Entity Framework and its providers do occasionally sacrifice some readability in favor of consistency over a wide range of queries. But it shouldn't be a performance issue.

提交回复
热议问题