Manipulating utf8mb4 data from MySQL with PHP

限于喜欢 提交于 2019-12-01 05:39:15

I'd simply guess that you are setting the table to utf8mb4, but your connection encoding is set to utf8. You have to set it to utf8mb4 as well, otherwise MySQL will convert the stored utf8mb4 data to utf8, the latter of which cannot encode "high" Unicode characters. (Yes, that's a MySQL idiosyncrasy.)

On a raw MySQL connection, it will have to look like this:

SET NAMES 'utf8mb4';
SELECT * FROM `my_table`;

You'll have to adapt that to the best way of the client, depending on how you connect to MySQL from PHP (mysql, mysqli or PDO).

To really clarify (yes, using the mysql_ extension for simplicity, don't do that at home):

mysql_set_charset('utf8mb4');     // adapt to your mysql connector of choice

$r = mysql_query('SELECT * FROM `my_table`');

var_dump(mysql_fetch_assoc($r));  // data will be UTF8 encoded

Just to add to @deceze's answer, I recommend a well-configured MySQL server (for me, in /etc/mysql/mysql.conf.d/mysqld.cnf). Here are the configuration options to make sure you're using utfmb4, although I do recommend going through every MySQL configuration option though, daunting as it is, there are a lot of defaults that are are very non-optimal.


default-character-set           = utf8mb4


default_character_set           = utf8mb4


init-connect                    = "SET NAMES utf8mb4"
character-set-client-handshake  = FALSE
character-set-server            = "utf8mb4"
collation-server                = "utf8mb4_unicode_ci"
autocommit                      = 1
block_encryption_mode           = "aes-256-cbc"

That last one is just one that should be default. Also, init-connect deals with not having to execute that every time. Keeps code clean. Now run:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

You should return something like the following:

| Variable_name            | Value              |
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |

And looks like you're doing this already, but doesn't hurt to explicitly define on table creation:

CREATE TABLE `mysql_table` (
  PRIMARY KEY (`mysql_column`)

Hope this helps someone.
