Output UTF-16? A little stuck

廉价感情. 提交于 2019-12-02 02:36:32

问题


I have some UTF-16 encoded characters in their surrogate pair form. I want to output those surrogate pairs as characters on the screen.

Does anyone know how this is possible?


回答1:


iconv('UTF-16', 'UTF-8', yourString)




回答2:


Your question is a little unclear.

If you have ASCII text with embedded UTF-16 escape sequences, you can convert everything to UTF-8 in this way:

function unescape_utf16($string) {
    /* go for possible surrogate pairs first */
    $string = preg_replace_callback(
        '/\\\\u(D[89ab][0-9a-f]{2})\\\\u(D[c-f][0-9a-f]{2})/i',
        function ($matches) {
            $d = pack("H*", $matches[1].$matches[2]);
            return mb_convert_encoding($d, "UTF-8", "UTF-16BE");
        }, $string);
    /* now the rest */
    $string = preg_replace_callback('/\\\\u([0-9a-f]{4})/i',
        function ($matches) {
            $d = pack("H*", $matches[1]);
            return mb_convert_encoding($d, "UTF-8", "UTF-16BE");
        }, $string);
    return $string;
}

$string = '\uD869\uDED6';
echo unescape_utf16($string);

which gives the character 𪛖 in UTF-8 (requires 4 bytes since it's outside the BMP).

If all your text is UTF-16 (including HTML tags, etc.), you could simply tell the browser the output is in UTF-16:

header("Content-type: text/html; charset=UTF-16");

This is very rare, because PHP scripts cannot be written in UTF-16 (unless PHP is compiled with multibyte support), which would make printing literal strings awkward.

So you probably only have a piece of text in UTF-16 that you want to convert to whatever encoding your webpage is using. You can do this conversion with:

//replace UTF-8 with your actual page encoding
mb_convert_encoding($string, "UTF-8", "UTF-16");


来源:https://stackoverflow.com/questions/3506988/output-utf-16-a-little-stuck

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!