SQL Server : when to use collation and nvarchar

问题

Currently my column datatype is varchar in my SQL Server table.

I want to store both English and Chinese characters in my column.

What steps do I have to follow to use collation, or do I have to change the datatype to NVARCHAR and insert with N' as unicode?

If I have to use collation what collation I should use.

Please help me in this

回答1:

You are mixing two concepts:

data type and encoding

VARCHAR stores your data in chunks of 8 bit. basic characters are one chunk. But sometimes there is one special chunk telling the engine, that this character has to be interpreted as a two-byte character. If you get in touch with languages with a very differing char-set you'll find even characters which need three bytes.

NVARCHAR stores each character as UniCode 16-bit (or 2-byte). This allows one single encoding for all characters, no tricks needed.

Collation

The Collation is used for string comparisons and is used when you deal with string values in WHERE within JOINs or in indexes and for sorting.

SQL Server has a default collation which is used with new databases and - very important! - within your temp table.

It is allowed to define a different default collation on database level, but this can lead to severe problems, if you run queries against the temp table where the collation is not the same.

You are allowed to define the collation on column level too.

And you are allowed to define the collation even within your statements for each column separately. This is the highest grade of control but means a lot of typing and very hard-to-read code...

If you want to store English and Chinese in one column you must use NVARCHAR. There is no fitting-to-everything collation, this you must try.

You might store your strings in a side table with proper configuration and bind it within your queries...

UPDATE: ad proper configuration:

You should use different columns for English and Chinese strings. Or even one separate side table for each language... This allows you to set the best collation for each column/language separately. And it makes it easy to add new languages in a multi-language environment.