How do I convert unicode codepoints to hexadecimal HTML entities?

前端 未结 2 1963
野的像风
野的像风 2021-01-14 12:31

I have a data file (an Apple plist, to be exact), that has Unicode codepoints like \\U00e8 and \\U2019. I need to turn these into valid hexadecima

2条回答
  •  走了就别回头了
    2021-01-14 13:04

    Here's a correct answer, that deals with the fact that those are code units, not code points, and allows unencoding supplementary characters.

    function unenc_utf16_code_units($string) {
        /* go for possible surrogate pairs first */
        $string = preg_replace_callback(
            '/\\\\U(D[89ab][0-9a-f]{2})\\\\U(D[c-f][0-9a-f]{2})/i',
            function ($matches) {
                $hi_surr = hexdec($matches[1]);
                $lo_surr = hexdec($matches[2]);
                $scalar = (0x10000 + (($hi_surr & 0x3FF) << 10) |
                    ($lo_surr & 0x3FF));
                return "&#x" . dechex($scalar) . ";";
            }, $string);
        /* now the rest */
        $string = preg_replace_callback('/\\\\U([0-9a-f]{4})/i',
            function ($matches) {
                //just to remove leading zeros
                return "&#x" . dechex(hexdec($matches[1])) . ";";
            }, $string);
        return $string;
    }
    

提交回复
热议问题