I Understand Character sets but I don\'t understand Collation. I know you get a default collation with every Character set in Mysql or any RDBMS but I still don\'t get it! C
Collation is information about how strings should be sorted and compared.
It contains for example case sensetivity, e.g. whether a
= A
, special character considerations, e.g. whether a
= á
, and character order, e.g. whether O
< Ö
.
The main point of a database collation is determining how data is sorted and compared.
Case sensitivity of string comparisons
SELECT "New York" = "NEW YORK";`
will return true for a case insensitive collation; false for a case sensitive one.
Which collation does which can be told by the _ci
and _cs
suffix in the collation's name. _bin
collations do binary comparisons (strings must be 100% identical).
Comparison of umlauts/accented characters
the collation also determines whether accented characters are treated as their latin base counterparts in string comparisons.
SELECT "Düsseldorf" = "Dusseldorf";
SELECT "Èclair" = "Eclair";
will return true in the former case; false in the latter. You will need to read each collation's description to find out which is which.
String sorting
The collation influences the way strings are sorted.
For example,
Umlauts Ä Ö Ü
are at the end of the alphabet in the finnish/swedish alphabet latin1_swedish_ci
they are treated as A O U
in German DIN-1 sorting (latin_german1_ci
)
and as AE OE UE
in German DIN-2 sorting (latin_german2_ci
). ("phone book" sorting)
In latin1_spanish_ci
, "ñ" (n-tilde) is a separate letter between "n" and "o".
These rules will result in different sort orders when non-latin characters are used.
Using collations at runtime
You have to choose a collation for your table and columns, but if you don't mind the performance hit, you can force database operations into a certain collation at runtime using the COLLATE
keyword.
This will sort table
by the name
column using German DIN-2 sorting rules:
SELECT name
FROM table
ORDER BY name COLLATE latin1_german2_ci;
Using COLLATE
at runtime will have performance implications, as each column has to be converted during the query. So think twice before applying this do large data sets.
MySQL Reference: