I have a set of keywords that are passed through via JSON from a DB (encoded UTF-8), some of which may have special characters like é, è, ç, etc. This is used as part of an auto
The reason could be the current client character setting. A simple solution could be to do set the client with
mysql_query('SET CHARACTER SET utf8')
before running the SELECT
query.
Update (June 2014)
The mysql extension is deprecated as of PHP 5.5.0. It is now recommended to use mysqli. Also, upon further reading - the above way of setting the client set should be avoided for reasons including security.
I haven't tested it, but this should be an ok substitute:
$mysqli = new mysqli("localhost", "my_user", "my_password", "my_db");
if (!$mysqli->set_charset('utf8')) {
printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
printf("Current character set: %s\n", $mysqli->character_set_name());
}
or with the connection parameter :
$conn = mysqli_connect("localhost", "my_user", "my_password", "my_db");
if (!mysqli_set_charset($conn, "utf8")) {
# TODO - Error: Unable to set the character set
exit;
}
My solution to encode utf8 data was :
$jsonArray = addslashes(json_encode($array, JSON_FORCE_OBJECT|JSON_UNESCAPED_UNICODE))
Try sending your array through this function before doing json_encode():
<?php
function utf8json($inArray) {
static $depth = 0;
/* our return object */
$newArray = array();
/* safety recursion limit */
$depth ++;
if($depth >= '30') {
return false;
}
/* step through inArray */
foreach($inArray as $key=>$val) {
if(is_array($val)) {
/* recurse on array elements */
$newArray[$key] = utf8json($inArray);
} else {
/* encode string values */
$newArray[$key] = utf8_encode($val);
}
}
/* return utf8 encoded array */
return $newArray;
}
?>
Taken from comment on phpnet @ http://php.net/manual/en/function.json-encode.php.
The function basically loops though array elements, perhaps you did your utf-8 encode on the array itself?
I tried your code sample like this
[~]> cat utf.php
<?php
$arr = array('Coffee', 'Cappuccino', 'Café');
print json_encode($arr);
[~]> php utf.php
["Coffee","Cappuccino","Caf\u00e9"]
[~]>
Based on that I would say that if the source data is really UTF-8, then json_encode works just fine. If its not, then thats where you get null. Why its not, I cannot tell based on this information.
json_encode
seems to be dropping strings that contain invalid characters. It is likely that your UTF-8 data is not arriving in the proper form from your database.
Looking at the examples you give, my wild guess would be that your database connection is not UTF-8 encoded and serves ISO-8859-1 characters instead.
Can you try a SET NAMES utf8;
after initializing the connection?