Weird SQL Server 2005 Collation difference between varchar() and nvarchar()

▼魔方 西西 提交于 2019-12-01 18:02:13

问题


Can someone please explain this:

SELECT 
  CASE WHEN CAST('iX' AS nvarchar(20)) 
      > CAST('-X' AS nvarchar(20)) THEN 1 ELSE 0 END,
  CASE WHEN CAST('iX' AS varchar(20)) 
      > CAST('-X' AS varchar(20)) THEN 1 ELSE 0 END

Results: 0 1

SELECT 
  CASE WHEN CAST('i' AS nvarchar(20)) 
      > CAST('-' AS nvarchar(20)) THEN 1 ELSE 0 END,
  CASE WHEN CAST('i' AS varchar(20)) 
      > CAST('-' AS varchar(20)) THEN 1 ELSE 0 END

Results: 1 1

On the first query, the nvarchar() result is not what I'm expecting, and yet removing the X make the nvarchar() sort happen as expected.

(My original queries used the '' and N'' literal syntax to distinguish varchar() and nvarchar() rather than CAST() and got the same result.)

Collation setting for the database is SQL_Latin1_General_CP1_CI_AS.


回答1:


When comparing unicode text, hyphens are treated specially. Unicode comparison uses "dictionary order", which ignores hyphens. This is not the case with non-unicode text comparison.

Comparing -X and iX, is like comparing X and iX, so -X, the left side, is greater. When comparing "-" and "i", is like comparing "" and "i", so "i", the right side is greater.

From MSDN,

A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.

SELCT body From MSDN_Articles WHERE url IN ("http://support.microsoft.com/kb/322112")




回答2:


A nice question!

Digging around, I found that the issue is related to hyphens and apostrophes. Your example exhibits the same 'odd' behaviour with '''X' as with '-X'.

I can't take credit for finding the answer, because it's here: SQL Sorting and hyphens



来源:https://stackoverflow.com/questions/3056967/weird-sql-server-2005-collation-difference-between-varchar-and-nvarchar

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!