character-reference

Spec justification for € to Ÿ in UTF-8 documents browser behaviour wanted

半城伤御伤魂 提交于 2019-12-23 12:59:34
问题 The HTML 4.01 spec says for hexadecimal character references Numeric character references specify the code position of a character in the document character set. So if the document character set encoding is UTF-8, the numeric references should specify a Unicode code point. The HTML5 spec says for hexadecimal character references The ampersand must be followed by a U+0023 NUMBER SIGN character (#), which must be followed by either a U+0078 LATIN SMALL LETTER X character (x) or a U+0058 LATIN

How can I convert HTML character references (ף) to regular UTF-8?

岁酱吖の 提交于 2019-12-09 00:56:59
问题 I have some hebrew websites that contains character references like: נוף I can only view these letters if I save the file as .html and view in UTF-8 encoding. If I try to open it as a regular text file then UTF-8 encoding does not show the proper output. I noticed that if I open a text editor and write hebrew in UTF-8, each character takes two bytes not 4 bytes line in this example ( ו ) Any ideas if this is UTF-16 or any other kind of UTF representation of letters? How can I convert it to