问题
I am using flourishlib for a website. My client requested that we should be able to use emojis with mobile phones. In theory we should change the character-encoding from utf8 to utf8mb4 for the MySQL database.
So far, so good, however, if we make this switch, like this:
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)
Then each character will use four bytes instead of three bytes. This would increase the database's size with 33%. This would result in worse performance and more storage space used up. So, as a result, we have decided to switch to an encoding of utf8mb4 for only specific columns of specific tables.
To make sure everything is all right, I have checked several things. Among them, I have checked flourishlib and found a few suspect parts:
There is an fUTF8 class, which does not seem to support utf8mb4
At fDatabase I am quoting some findings:
if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8', $this->connection)) { throw new fConnectivityException( 'There was an error setting the database connection to use UTF-8' ); } //... // Make MySQL act more strict and use UTF-8 if ($this->type == 'mysql') { $this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'"); $this->execute("SET NAMES 'utf8'"); $this->execute("SET CHARACTER SET utf8"); }
At fSQLSchemaTranslation I can see this:
$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8', $sql);
I have the suspicion that flourishlib will not support our quest of making a few columns of a few table have a character encoding of utf8mb4. I wonder whether we can upgrade something somehow to make this support. As a worst-case scenario, we can override every textual occurrence of utf8 to utf8mb4. However, that would be a very ugly hack and we wonder whether there is a better solution. Should we make this hack or is there a more orthodox approach?
回答1:
I have resolved the issue. I have altered the tables where I wanted to support emojis by changing the column character set and collation, like this:
ALTER TABLE table_name CHANGE column_name column_name text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
After that, I had to make a few ugly hacks to make flourishlib able to support emojis.
fDatabase.php:
line 685:
if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8mb4', $this->connection)) {
throw new fConnectivityException(
'There was an error setting the database connection to use UTF-8'
);
}
line 717 stays the same, everything crashes if this line is changed:
if ($this->connection && function_exists('mysqli_set_charset') && !mysqli_set_charset($this->connection, 'utf8')) {
line 800:
// Make MySQL act more strict and use UTF-8
if ($this->type == 'mysql') {
$this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
$this->execute("SET NAMES 'utf8mb4'");
$this->execute("SET CHARACTER SET utf8mb4");
}
fSQLSchemaTranslation.php:
line 1554:
$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8mb4', $sql);
fXML.php:
line 403:
if (preg_replace('#[^a-z0-9]#', '', strtolower($encoding)) == 'utf8mb4') {
// Remove the UTF-8 BOM if present
$xml = preg_replace("#^\xEF\xBB\xBF#", '', $xml);
fCore::startErrorCapture(E_NOTICE);
$cleaned = self::iconv('UTF-8', 'UTF-8', $xml);
if ($cleaned != $xml) {
$xml = self::iconv('Windows-1252', 'UTF-8', $xml);
}
fCore::stopErrorCapture();
}
and finally, when there are modifications for any of the columns affected, I execute this:
App::db()->query("set names 'utf8mb4'");
which, essentially triggers the ->query()
execution of an fDatabase
object.
回答2:
increase the database's size with 33%.
Not true. English letters still take 1 byte each. What you gain with utf8mb4 is the ability to store emoji and some Chinese characters.
You shouldn't need to ALTER ... CHANGE
the columns. Except that you probably had a canned VARCHAR(255)
which has issues. Don't simply switch to 191, switch to a 'reasonable' number for each column. Or do nothing. The 191 comes only from an INDEX
limitation. You are no indexing every column, are you?
fUTF8 class, which does not seem to support
Complain to flourishlib. Or abandon it. (Too many questions in these forums are complaints about inadequate 3rd party packages, not MySQL, itself.)
You might be able to change to utf8mb4 in MySQL and let flourishlib be oblivious to the change. Technically speaking, MySQL's utf8mb4 matches the rest of the world's concept of utf8; MySQL's utf8 is an incomplete implementation.
$this->execute("SET NAMES 'utf8'");
If you can see this code, you can change it.
来源:https://stackoverflow.com/questions/32446440/how-to-support-emojis-with-flourish