Why is query with phone = N'1234' slower than phone = '1234'?

前端 未结 3 1617
醉话见心
醉话见心 2021-02-01 01:09

I have a field which is a varchar(20)

When this query is executed, it is fast (Uses index seek):

SELECT * FROM [dbo].[phone] WHERE phone = \'5554474477\'         


        
3条回答
  •  终归单人心
    2021-02-01 01:39

    Other answers already explain what happens; we've seen NVARCHAR has higher type precedence than VARCHAR. I want to explain why the database must cast every row for the column as an NVARCHAR, rather than casting the single supplied value as VARCHAR, even though the second option is clearly much faster, both intuitively and empirically.

    Casting from NVARCHAR to VARCHAR is a narrowing conversion. That is, NVARCHAR has potentially more information than a similar VARCHAR value. It's not possible to represent every NVARCHAR input with a VARCHAR output, so casting from the former to the latter potentially loses some information. But the opposite cast is a widening conversion. Casting from a VARCHAR value to an NVARCHAR value never loses information; it's safe.

    The principle is for Sql Server to always choose the safe conversion when presented with two mismatched types. It's the same old "correctness trumps performance" mantra. Or, to paraphrase Benjamin Franklin, "He who would trade essential correctness for a little performance deserve neither correctness nor performance." The type precedence rules, then, are designed to ensure the safe conversions are chosen.

    Now you and I both know your narrowing conversion is also safe for this particular data, but the Sql Server query optimizer doesn't care about that. For better or worse, it sees the data type information first when building the execution plan and follows the type precedence rules.

    Here's the real kicker: now we're making this cast, we have to do it for every row in the table. This is true even for rows which would not otherwise match the comparison filter. Morever, the cast values from the columns are no longer the same as the values stored in an index, such that any index on the column is now worthless for this query.

    I think you're very lucky to be getting an index scan for this query, rather than a full table scan, and it's likely because there is a covering index that meets the needs of the query (the optimizer can choose to cast all the records in the index as easily as all the records in the table).

    You can fix things for this query by explicitly resolving the type mismatch in a more favorable way. The best way to accomplish this is, of course, supplying a plain VARCHAR in the first place and avoid any need for casting/conversion at all:

    SELECT * FROM [dbo].[phone] WHERE phone = '5554474477'
    

    But I suspect what we're seeing is a value provided by an application, where you don't necessarily control that part of the literal. If so, you can still do this:

    SELECT * FROM [dbo].[phone] WHERE phone = cast(N'5554474477' as varchar(20))
    

    Either example favorably resolves the type mismatch from the original code. Even with the latter situation, you may have more control over the literal than you know. For example, if this query was created from a .Net program the problem is possibly related to the AddWithValue() function. I've written about this issue in the past and how to handle it correctly.

    These fixes also help demonstrate why things are this way.

    It may be possible at some point in the future the Sql Server developers enhance the query optimizer to look at situations where type precedence rules cause a per-row conversion resulting in a table or index scan, but the opposite conversion involves constant data and could be just an index seek, and in that case first look at the data to see if it would also be safe. However, I find it unlikely they will ever do this. In my opinion, the corrections to queries within the existing system are too easy relative to the additional performance cost completing the evaluation for individual queries and the complexity in understanding what the optimizer is doing ("Why didn't the server follow the documented precedence rules here?") to justify it.

提交回复
热议问题