PHP decoding and encoding json with unicode characters

前端 未结 8 966
野性不改
野性不改 2020-11-27 17:46

I have some json I need to decode, alter and then encode without messing up any characters.

If I have a unicode character in a json string it will not decode. I\'m n

相关标签:
8条回答
  • 2020-11-27 18:14

    I have found following way to fix this issue... I hope this can help you.

    json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
    
    0 讨论(0)
  • 2020-11-27 18:16

    Judging from everything you've said, it seems like the original Odómetro string you're dealing with is encoded with ISO 8859-1, not UTF-8.

    Here's why I think so:

    • json_encode produced parseable output after you ran the input string through utf8_encode, which converts from ISO 8859-1 to UTF-8.
    • You did say that you got "mangled" output when using print_r after doing utf8_encode, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó is \x63\xb3 in UTF-8, but that sequence is ó in ISO 8859-1.
    • Your htmlentities hackaround solution worked. htmlentities needs to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.)
    • You said you had the same problem in Python, which would seem to exclude PHP from being the issue.

    PHP will use the \uXXXX escaping, but as you noted, this is valid JSON.

    So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8' to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode, but this will only support characters that are part of ISO 8859-1).

    Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_r testing).

    0 讨论(0)
  • 2020-11-27 18:24

    Try Using:

    utf8_decode() and utf8_encode
    
    0 讨论(0)
  • 2020-11-27 18:25

    To encode an array that contains special characters, ISO 8859-1 to UTF8. (If utf8_encode & utf8_decode is not what is working for you, this might be an option)

    Everything that is in ISO-8859-1 should be converted to UTF8:

    $utf8 = utf8_encode('이 감사의 마음을 전합니다!'); //contains UTF8 & ISO 8859-1 characters;    
    $iso88591 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');
    $data = $iso88591;
    

    Encode should work after this:

    $encoded_data = json_encode($data);
    

    Convert UTF-8 to & from ISO 8859-1

    0 讨论(0)
  • 2020-11-27 18:26

    JSON_UNESCAPED_UNICODE was added in PHP 5.4 so it looks like you need upgrade your version of PHP to take advantage of it. 5.4 is not released yet though! :(

    There is a 5.4 alpha release candidate on QA though if you want to play on your development machine.

    0 讨论(0)
  • 2020-11-27 18:30
    $json = array('tag' => 'Odómetro'); // Original array
    $json = json_encode($json); // {"Tag":"Od\u00f3metro"}
    $json = json_decode($json); // Od\u00f3metro becomes  Odómetro
    echo $json->{'tag'}; // Odómetro
    echo utf8_decode($json->{'tag'}); // Odómetro
    

    You were close, just use utf8_decode.

    0 讨论(0)
提交回复
热议问题