Encoding space character in XML name

前端 未结 2 1538
傲寒
傲寒 2020-11-29 14:14

I am given an XML file which contains names like below:

something

The ↂ symbol is re

相关标签:
2条回答
  • 2020-11-29 14:28

    The XML isn't broken, but it's representing names using a private convention for escaping disallowed characters. The XML parser won't understand this convention, it's up to the receiving application to interpret it.

    0 讨论(0)
  • 2020-11-29 14:41

    Space characters are not permitted in XML names

    There are 86 codepoints whose name contain the word space. Ignoring the codepoints where SPACE hits due to MONOSPACE and any other that have a visual representation, leaves the following:

    • #x0020 SPACE
    • #x00A0 NO-BREAK SPACE
    • [#x2002-#x200A] EN SPACE through HAIR SPACE
    • #x205F MEDIUM MATHEMATICAL SPACE
    • #x3000 IDEOGRAPHIC SPACE

    None of the space-related code points (empty visual representation) are permitted in XML names by the W3C XML BNF for component names:

    NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] |
                      [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
                      [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
                      [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
                      [#x10000-#xEFFFF]
    NameChar      ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] |
                      [#x203F-#x2040]
    Name          ::= NameStartChar (NameChar)*
    

    Alternatives to spaces in XML names

    • CamelCase
    • underscore_char
    • hyphen-char
    • period.char

    Colon should not be used as a word separator in XML names to avoid confusion with its use in XML Namespaces.


    ↂ is permitted in XML names

    The character, ↂ, (0xE2, 0x86, 0x82, which is #x2182), has nothing to do with spaces – it is ROMAN NUMERAL TEN THOUSAND. ↂ is explicitly permitted: #x2182 is in the [#x2070-#x218F] code range.

    The 0020 appearing after ↂ are just digits. Together with the rest of the characters in Benchↂ0020Codeↂ0020, these form an allowed (albeit unconventional) XML name. They do not constitute spaces in the XML name as spaces are not allowed in XML names.

    0 讨论(0)
提交回复
热议问题