loading xml document fails with special character »

后端 未结 2 1660
无人及你
无人及你 2021-01-23 00:34

I\'m consuming an RSS feed and the document contains a special character »

I\'m guessing the feed is not encoded properly but I can\'t change that. I\'d like to o

相关标签:
2条回答
  • 2021-01-23 01:11

    +1 what Frédéric said. You can also serve » as a raw unescaped character, presumably encoded in UTF-8.

    If it's someone else's RSS feed, you need to kick them to stop producing malformed XML; no XML parser will read this.

    In a <description> element, the HTML content should normally be XML-escaped. So if the description of the item is This is a <em>really</em> interesting article, it should appear in the XML as:

    <description>This is a &lt;em>really&lt;/em> interesting article</description>
    

    Consequently, an HTML-encoded » character should have come out as

    &amp;raquo;
    

    If it was included directly from an HTML source without being escaped, that's a more serious XML-injection problem.

    (This is assuming RSS 2.0. In the various earlier versions of RSS, whether the <description> contained HTML or plain text varied from spec to spec and was sometimes completely unspecified. For old RSS versions it's not really reliable to use HTML content at all.)

    0 讨论(0)
  • 2021-01-23 01:14

    &raquo; is an HTML named entity and is not supported in XML. Out of the box, XML only supports &amp;, &apos;, &quot;, &gt; and &lt;.

    Use the corresponding numeric entity &#187; (or hexadecimal &#xbb;) instead.

    0 讨论(0)
提交回复
热议问题