Currently we are using the following commands in PHP to set the character set to UTF-8 in our application.
Since this is a bit of overhead, we\'d like to set this a
All settings listed here are correct, but here are the most optimal and sufficient solution:
[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
character-set-server = utf8
collation-server = utf8_unicode_ci
[client]
default-character-set = utf8
Add these to /etc/mysql/my.cnf
.
Please note, I choose utf8_unicode_ci type of collation due to the performance issue.
The result is:
mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
And this is when you connect as non-SUPER user!
For example, the difference between connection as SUPER and non-SUPER user (of course in case of utf8_unicode_ci collation):
user with SUPER priv.:
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci | <---
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
user with non-SUPER priv.:
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
I wrote a comprehensive article (rus) explaining in details why you should use one or the other option. All types of Character Sets and Collations are considered: for server, for database, for connection, for table and even for column.
I hope this and the article will help to clarify unclear moments.