Print Unicode characters PHP

拥有回忆 提交于 2019-12-08 16:41:47

问题


I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response.

For instance, when I print all games with the name like Uncharted, I get this:

Uncharted: Drake's Fortuneâ„¢
Uncharted 2: Among Thievesâ„¢
Uncharted 3: Drake's Deceptionâ„¢

but it should display this:

Uncharted: Drake's Fortune™
Uncharted 2: Among Thieves™
Uncharted 3: Drake's Deception™

I ran a quick JavaScript escape function to see which Unicode character the is and found that it's \u2122.

I don't have a problem fully escaping every character in the string if I can get the character to display correctly. My guess is to somehow find the hex representation of each character in the string and have PHP render the Unicode characters like this:

print "&#x2122";

Please guide me through the best approach for Unicode escaping a string for being HTML friendly. I've done something similar for JavaScript a while back, but JavaScript has a built in function for escape and unescape.

I'm not aware of any PHP functions of similar functionality however. I have read about the ord function, but it just returns the ASCII character code for a given character, hence the improper display of the ™ or the ™. I would like this function to be versatile enough to apply to any string containing valid Unicode characters.


回答1:


It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).

The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header:

header("content-type: text/html; charset=UTF-8");  

Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.

If you want, you can additionally declare the encoding in the generated HTML by using the <meta> tag:

  • <meta http-equiv=Content-Type content="text/html; charset=UTF-8"> for HTML <=4.01
  • <meta charset="UTF-8"> for HTML5

HTTP header has priority over the <meta> tag, but the latter may be useful if the HTML is saved to HD and then read locally.




回答2:


I spent a lot of time trying to find the better way to just print the equivalent char of an unicode code, and the methods I found didn't work or it just were very complicated.

This said, JSON is able to represent unicode characters using the syntax "\u[unicode_code]", then:

echo json_decode('"\u00e1"'); 

Will print the equivalent unicode char, in this case: á.

P.D. Note the simple and the double quotes. If you don't put both it won't work.




回答3:


Try this:

echo htmlentities("Uncharted: Drakes Fortune™ \n", ENT_QUOTES, "UTF-8");

From: http://php.net/htmlentities




回答4:


// PHP 7.0
var_dump(
    IntlChar::chr(0x2122),
    IntlChar::chr(0x1F638)
);

var_dump(
    utf8_chr(0x2122),
    utf8_chr(0x1F638)
);

function utf8_chr($cp) {

    if (!is_int($cp)) {
        exit("$cp is not integer\n");
    }

    // UTF-8 prohibits characters between U+D800 and U+DFFF
    // https://tools.ietf.org/html/rfc3629#section-3
    //
    // Q: Are there any 16-bit values that are invalid?
    // http://unicode.org/faq/utf_bom.html#utf16-7

    if ($cp < 0 || (0xD7FF < $cp && $cp < 0xE000) || 0x10FFFF < $cp) {
        exit("$cp is out of range\n");
    }

    if ($cp < 0x10000) {
        return json_decode('"\u'.bin2hex(pack('n', $cp)).'"');
    }

    // Q: Isn’t there a simpler way to do this?
    // http://unicode.org/faq/utf_bom.html#utf16-4
    $lead = 0xD800 - (0x10000 >> 10) + ($cp >> 10);
    $trail = 0xDC00 + ($cp & 0x3FF);

    return json_decode('"\u'.bin2hex(pack('n', $lead)).'\u'.bin2hex(pack('n', $trail)).'"');
}


来源:https://stackoverflow.com/questions/17539412/print-unicode-characters-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!