How to decode Unicode escape sequences like “\u00ed” to proper UTF-8 encoded characters?

后端 未结 7 822
傲寒
傲寒 2020-11-22 01:01

Is there a function in PHP that can decode Unicode escape sequences like \"\\u00ed\" to \"í\" and all other similar occurrences?

I found si

7条回答
  •  渐次进展
    2020-11-22 01:15

    This is a sledgehammer approach to replacing raw UNICODE with HTML. I haven't seen any other place to put this solution, but I assume others have had this problem.

    Apply this str_replace function to the RAW JSON, before doing anything else.

    function unicode2html($str){
        $i=65535;
        while($i>0){
            $hex=dechex($i);
            $str=str_replace("\u$hex","&#$i;",$str);
            $i--;
         }
         return $str;
    }
    

    This won't take as long as you think, and this will replace ANY unicode with HTML.

    Of course this can be reduced if you know the unicode types that are being returned in the JSON.

    For example my code was getting lots of arrows and dingbat unicode. These are between 8448 an 11263. So my production code looks like:

    $i=11263;
    while($i>08448){
        ...etc...
    

    You can look up the blocks of Unicode by type here: http://unicode-table.com/en/ If you know you're translating Arabic or Telegu or whatever, you can just replace those codes, not all 65,000.

    You could apply this same sledgehammer to simple encoding:

     $str=str_replace("\u$hex",chr($i),$str);
    

提交回复
热议问题