What is the difference between NVarchar, Bin collation, Bin2 collation?

前端 未结 2 977
滥情空心
滥情空心 2021-01-03 07:51

All 3 options are case and accent sensitive, and support Unicode. According to the documentation:

  1. NVarchar sorts and compares data based on the \"dictionari

2条回答
  •  北海茫月
    2021-01-03 08:19

    nvarchar is a data type, and the "BIN" or "BIN2" collations are just that - collation sequences. They are two different things.

    You use an nvarchar column to store unicode character data:

    nchar and nvarchar (Transact-SQL)

    String data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.

    https://msdn.microsoft.com/en-GB/library/ms186939(v=sql.105).aspx

    An nvarchar column will have an associated collation sequence that defines how the characters sort and compare. This can also be set for the whole database.

    COLLATE (Transact-SQL)

    Is a clause that can be applied to a database definition or a column definition to define the collation, or to a character string expression to apply a collation cast.

    https://msdn.microsoft.com/en-us/library/ms184391(v=sql.105).aspx

    So, when working with character data in SQL server, you always use both a character data-type (nvarchar, varchar, nchar or char) along with an appropriate collation according to your needs for case-sensitivity, accent-sensitivity etc.

    For example, in my work I normally use the "Latin1_General_CI_AI" collation. This is suitable for latin character sets, and provides case-insensitive and accent-insensitive matching for queries. That means that the following strings are all considered to be equal:

    • Höller, höller, Holler, holler

    This is ideal for systems where there may be words containing accented characters (as above), but you can't be sure you users will enter the accents when searching for something.

    If you only wanted case-insensitivity then you would use a "CI_AS" (accent sensitive) collation instead.

    The "_BIN" collations are for binary comparisons that treat every distinct character as different, and wouldn't be used for general text comparisons.


    Edit for updated question:

    Provided that you always use nvarchar (as opposed to varchar) columns then you always have support for all unicode code points, no matter what collation is used.

    There is no practical difference in your example query, as it is only a simple insert and select. Also bear in mind that your first "word1" column will be using the database or server's default collation - there's always a collation in use!

    Where the differences will occur is if you use criteria against your nvarchar columns, or sort by them. This is what collations are for - they define which characters should be treated as equivalent for comparisons and sorting.

    I can't say anything about Cyrillic, but in the case of Latin characters, using the "Latin1_General_CI_AI" collation, then characters such as A a á â etc are all equivalent - the case and the accent are ignored.

    Imagine if you have the string Aaáâ stored in your "word1" column, then the query SELECT * FROM words2 WHERE word1 = 'aaaa' will return your row.

    If you use a "_BIN" collation then all these characters are treated as distinct, and the query above would not return a row. I can't think of a situation where you'd want to use a "_BIN" collation when working with textual data. Edit 2: Actually I can - storing password hashes would be a good place to use a binary collation, so that comparisons are exact. That's about all.

    I hope this makes it clearer.

提交回复
热议问题