I need to create an application in PHP that can handle all Unicode characters in all places — edit fields, static HTML, database. Can somebody tell me the complete list of a
I used the mentioned methods and they worked fine. Until recently, when my provider has updated PHP to 5.2.11 and MySQL to 5.0.81-community. After this change the unicode characters were properly retrieved from the database, but all updates were corrupted and unicode characters were being replaced by '?'.
The solution was to use:
mysql_set_charset('utf8',$conn);
It was required even though we used:
SET NAMES utf8
SET CHARACTER SET utf8
Also - since we have used ADOdb then we needed to find the PHP connection handle. We used the following statement:
mysql_set_charset('utf8',$adoConn->_connectionID);
Apache
The server encoding must be either not set, or set to UTF-8. This is done via the apache AddDefaultCharset directive. This can go to the virtualhost or the general file (see documentation).
AddDefaultCharset utf-8
MySql
SET NAMES 'utf8' COLLATE 'utf8_unicode_ci'
PHP
1- You should set the HTML charset of the page to be UTF-8, via a meta tag on the page, or via a PHP header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> -or- header('Content-type: text/html; charset=utf-8');
2- You should always use the mb* version of string-related functions, for example, mbstrlen instead of strlen to get the string length of a string.
This should allow you to have UTF-8 everywhere, from the pages to the data. A test you can do: right-click anywhere on the page using firefox, and select Show page information. The effective encoding is listed in that page.
You were recommended to use either a HTTP header or a meta element to set the charset on your pages to utf-8. The W3C recommends that you do both. And the meta element should appear as early as possible on the page. (All characters before the meta element should be ASCII, which is basically identical in almost all character encodings. Some browsers will restart page rendering when they encounter the meta tag, which is another good reason to have it early.)
Also, on all forms accepting user input put an accept-charset="utf-8"
attribute. Generally browsers submitting POST data will default to the encoding of the page, but it's no harm to be sure.
Some things you will need to look into:-
PHP
Make sure your content is marked as utf-8 :
default_charset = "utf-8"
Install mbstring. You can find it here
Ensure that you are talking utf-8 between PHP and MySQL.
Call mysql_set_charset("utf8");
(or use the SQL query SET NAMES utf8
)
Apache
You also set the Content-Type:
of your pages in here with something like this
AddDefaultCharset utf-8
MySQL
Make sure all your tables use utf8 Collation utf8_general_ci; eg
ALTER DATABASE mydb CHARACTER SET utf8;
Finally
Finally, test stuff with fun unicode samples, like these ones
More helpful information from when I tried this...
Important: You should also ensure that you use UTF-8 as connection charset when connecting to Mysql from PHP!
For mysqli this is done by
mysqli_set_charset($dblink, 'utf-8')
http://de3.php.net/manual/en/mysqli.set-charset.php