问题
I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response.
For instance, when I print all games with the name like Uncharted, I get this:
Uncharted: Drake's Fortuneâ„¢
Uncharted 2: Among Thievesâ„¢
Uncharted 3: Drake's Deceptionâ„¢
but it should display this:
Uncharted: Drake's Fortune™
Uncharted 2: Among Thieves™
Uncharted 3: Drake's Deception™
I ran a quick JavaScript escape function to see which Unicode character the ™
is and found that it's \u2122
.
I don't have a problem fully escaping every character in the string if I can get the ™
character to display correctly. My guess is to somehow find the hex representation of each character in the string and have PHP render the Unicode characters like this:
print "™";
Please guide me through the best approach for Unicode escaping a string for being HTML friendly. I've done something similar for JavaScript a while back, but JavaScript has a built in function for escape and unescape.
I'm not aware of any PHP functions of similar functionality however. I have read about the ord function, but it just returns the ASCII character code for a given character, hence the improper display of the ™
or the ™
. I would like this function to be versatile enough to apply to any string containing valid Unicode characters.
回答1:
It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).
The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header:
header("content-type: text/html; charset=UTF-8");
Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.
If you want, you can additionally declare the encoding in the generated HTML by using the <meta>
tag:
<meta http-equiv=Content-Type content="text/html; charset=UTF-8">
for HTML <=4.01<meta charset="UTF-8">
for HTML5
HTTP header has priority over the <meta>
tag, but the latter may be useful if the HTML is saved to HD and then read locally.
回答2:
I spent a lot of time trying to find the better way to just print the equivalent char of an unicode code, and the methods I found didn't work or it just were very complicated.
This said, JSON is able to represent unicode characters using the syntax "\u[unicode_code]", then:
echo json_decode('"\u00e1"');
Will print the equivalent unicode char, in this case: á.
P.D. Note the simple and the double quotes. If you don't put both it won't work.
回答3:
Try this:
echo htmlentities("Uncharted: Drakes Fortune™ \n", ENT_QUOTES, "UTF-8");
From: http://php.net/htmlentities
回答4:
// PHP 7.0
var_dump(
IntlChar::chr(0x2122),
IntlChar::chr(0x1F638)
);
var_dump(
utf8_chr(0x2122),
utf8_chr(0x1F638)
);
function utf8_chr($cp) {
if (!is_int($cp)) {
exit("$cp is not integer\n");
}
// UTF-8 prohibits characters between U+D800 and U+DFFF
// https://tools.ietf.org/html/rfc3629#section-3
//
// Q: Are there any 16-bit values that are invalid?
// http://unicode.org/faq/utf_bom.html#utf16-7
if ($cp < 0 || (0xD7FF < $cp && $cp < 0xE000) || 0x10FFFF < $cp) {
exit("$cp is out of range\n");
}
if ($cp < 0x10000) {
return json_decode('"\u'.bin2hex(pack('n', $cp)).'"');
}
// Q: Isn’t there a simpler way to do this?
// http://unicode.org/faq/utf_bom.html#utf16-4
$lead = 0xD800 - (0x10000 >> 10) + ($cp >> 10);
$trail = 0xDC00 + ($cp & 0x3FF);
return json_decode('"\u'.bin2hex(pack('n', $lead)).'\u'.bin2hex(pack('n', $trail)).'"');
}
来源:https://stackoverflow.com/questions/17539412/print-unicode-characters-php