All 3 options are case and accent sensitive, and support Unicode. According to the documentation:
NVarchar sorts and compares data based on the \"dictionari
nvarchar
is a data type, and the "BIN" or "BIN2" collations are just that - collation sequences. They are two different things.
You use an nvarchar
column to store unicode character data:
nchar and nvarchar (Transact-SQL)
String data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.
https://msdn.microsoft.com/en-GB/library/ms186939(v=sql.105).aspx
An nvarchar
column will have an associated collation sequence that defines how the characters sort and compare. This can also be set for the whole database.
COLLATE (Transact-SQL)
Is a clause that can be applied to a database definition or a column definition to define the collation, or to a character string expression to apply a collation cast.
https://msdn.microsoft.com/en-us/library/ms184391(v=sql.105).aspx
So, when working with character data in SQL server, you always use both a character data-type (nvarchar, varchar, nchar or char) along with an appropriate collation according to your needs for case-sensitivity, accent-sensitivity etc.
For example, in my work I normally use the "Latin1_General_CI_AI" collation. This is suitable for latin character sets, and provides case-insensitive and accent-insensitive matching for queries. That means that the following strings are all considered to be equal:
This is ideal for systems where there may be words containing accented characters (as above), but you can't be sure you users will enter the accents when searching for something.
If you only wanted case-insensitivity then you would use a "CI_AS" (accent sensitive) collation instead.
The "_BIN" collations are for binary comparisons that treat every distinct character as different, and wouldn't be used for general text comparisons.
Edit for updated question:
Provided that you always use nvarchar
(as opposed to varchar
) columns then you always have support for all unicode code points, no matter what collation is used.
There is no practical difference in your example query, as it is only a simple insert and select. Also bear in mind that your first "word1" column will be using the database or server's default collation - there's always a collation in use!
Where the differences will occur is if you use criteria against your nvarchar columns, or sort by them. This is what collations are for - they define which characters should be treated as equivalent for comparisons and sorting.
I can't say anything about Cyrillic, but in the case of Latin characters, using the "Latin1_General_CI_AI" collation, then characters such as A
a
á
â
etc are all equivalent - the case and the accent are ignored.
Imagine if you have the string Aaáâ
stored in your "word1" column, then the query SELECT * FROM words2 WHERE word1 = 'aaaa'
will return your row.
If you use a "_BIN" collation then all these characters are treated as distinct, and the query above would not return a row. I can't think of a situation where you'd want to use a "_BIN" collation when working with textual data. Edit 2: Actually I can - storing password hashes would be a good place to use a binary collation, so that comparisons are exact. That's about all.
I hope this makes it clearer.