I\'m consuming an RSS feed and the document contains a special character »
I\'m guessing the feed is not encoded properly but I can\'t change that. I\'d like to o
+1 what Frédéric said. You can also serve »
as a raw unescaped character, presumably encoded in UTF-8.
If it's someone else's RSS feed, you need to kick them to stop producing malformed XML; no XML parser will read this.
In a <description>
element, the HTML content should normally be XML-escaped. So if the description of the item is This is a <em>really</em> interesting article
, it should appear in the XML as:
<description>This is a <em>really</em> interesting article</description>
Consequently, an HTML-encoded »
character should have come out as
&raquo;
If it was included directly from an HTML source without being escaped, that's a more serious XML-injection problem.
(This is assuming RSS 2.0. In the various earlier versions of RSS, whether the <description>
contained HTML or plain text varied from spec to spec and was sometimes completely unspecified. For old RSS versions it's not really reliable to use HTML content at all.)
»
is an HTML named entity and is not supported in XML. Out of the box, XML only supports &
, '
, "
, >
and <
.
Use the corresponding numeric entity »
(or hexadecimal »
) instead.