So I have an array of strings, and all of the strings are using the system default ANSI encoding and were pulled from a SQL database. So there are 256 diffe
The JSON standard ENFORCES Unicode encoding. From RFC4627:
3. Encoding
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
Therefore, on the strictest sense, ANSI encoded JSON wouldn't be valid JSON; this is why PHP enforces unicode encoding when using json_encode().
As for "default ANSI", I'm pretty sure that your strings are encoded in Windows-1252. It is incorrectly referred to as ANSI.
json_encode($str,JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_APOS|JSON_HEX_QUOT);
that will convert windows based ANSI to utf-8 and the error will be no more.
I found the following answer for an analogous problem with a nested array not utf-8 encoded that i had to json encode:
$inputArray = array(
'a'=>'First item - à',
'c'=>'Third item - é'
);
$inputArray['b']= array (
'a'=>'First subitem - ù',
'b'=>'Second subitem - ì'
);
if (!function_exists('recursive_utf8')) {
function recursive_utf8 ($data) {
if (!is_array($data)) {
return utf8_encode($data);
}
$result = array();
foreach ($data as $index=>$item) {
if (is_array($item)) {
$result[$index] = array();
foreach($item as $key=>$value) {
$result[$index][$key] = recursive_utf8($value);
}
}
else if (is_object($item)) {
$result[$index] = array();
foreach(get_object_vars($item) as $key=>$value) {
$result[$index][$key] = recursive_utf8($value);
}
}
else {
$result[$index] = recursive_utf8($item);
}
}
return $result;
}
}
$outputArray = json_encode(array_map('recursive_utf8', $inputArray ));
<?php
$array = array('first word' => array('Слово','Кириллица'),'second word' => 'Кириллица','last word' => 'Кириллица');
echo json_encode($array);
/*
return {"first word":["\u0421\u043b\u043e\u0432\u043e","\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"],"second word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430","last word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"}
*/
echo json_encode($array,256);
/*
return {"first word":["Слово","Кириллица"],"second word":"Кириллица","last word":"Кириллица"}
*/
?>
JSON_UNESCAPED_UNICODE (integer) Encode multibyte Unicode characters literally (default is to escape as \uXXXX). Available since PHP 5.4.0.
http://php.net/manual/en/json.constants.php#constant.json-unescaped-unicode
Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?
If you have an ANSI encoded string, using utf8_encode()
is the wrong function to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like \u0082
from the json output, but technically these sequences are valid for json, you must not fear them.
json_encode works with UTF-8
encoded strings only. If you need to create valid json
successfully from an ANSI
encoded string, you need to re-encode/convert it to UTF-8
first. Then json_encode
will just work as documented.
To convert an encoding from ANSI
(more correctly I assume you have a Windows-1252
encoded string, which is popular but wrongly referred to as ANSI
) to UTF-8
you can make use of the mb_convert_encoding() function:
$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");
Another function in PHP that can convert the encoding / charset of a string is called iconv based on libiconv. You can use it as well:
$str = iconv("CP1252", "UTF-8", $str);
utf8_encode() does only work for Latin-1
, not for ANSI
. So you will destroy part of your characters inside that string when you run it through that function.
Related: What is ANSI format?
For a more fine-grained control of what json_encode()
returns, see the list of predifined constants (PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).
As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's always needed to first change the encoding before using json_encode
. That's just a standard array operation, for the simpler case of pdo::fetch()
a foreach
iteration:
while($row = $q->fetch(PDO::FETCH_ASSOC))
{
foreach($row as &$value)
{
$value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
}
unset($value); # safety: remove reference
$items[] = array_map('utf8_encode', $row );
}
Use this instead:
<?php
//$return_arr = the array of data to json encode
//$out = the output of the function
//don't forget to escape the data before use it!
$out = '["' . implode('","', $return_arr) . '"]';
?>
Copy from json_encode php manual's comments. Always read the comments. They are useful.