I'm trying to transform XML:
<catalog>
<country><![CDATA[ WIN8 <b>X</b> Mac OS ]]></country>
</catalog>
into
<catalog>
<country><![CDATA[ WIN8 <b>X</b> Mac OS ]]></country>
</catalog>
with an XSL transform.
I know that using disable-output-escaping="yes" or cdata-section-elements I could transform escaped characters into unescaped and put inside CDATA, but this does not work if charaters are already inside CDATA.
Is there a simple way for this? Thanks.
This
<catalog>
<country><![CDATA[ WIN8 <b>X</b> Mac OS ]]></country>
</catalog>
is equivalent to
<catalog>
<country> WIN8 <b>X</b> Mac OS </country>
</catalog>
Which is exactly what you get when using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" />
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>
<xsl:template match="country/text()">
<xsl:value-of select="." disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
The point is that disable-output-escaping
(DOE) has no effect in an element that falls into cdata-section-elements
(CSE). That's because both directives disable output escaping.
The text value " WIN8 <b>X</b> Mac OS "
becomes:
when serialized normally:
WIN8 <b>X</b> Mac OS
when serialized with CSE:
<![CDATA[ WIN8 <b>X</b> Mac OS ]]>
when serialized with DOE:
WIN8 <b>X</b> Mac OS
Note how the last two renderings are exactly the same, except for the enclosing <![CDATA[ ... ]]>
.
CDATA disables output escaping for text node children of an element and in exchange encloses them in <![CDATA[ ... ]]>
markers to make up for the lost level of escaping.
If you additionally set DOE on an <xsl:value-of>
that outputs a text into an element that has CSE set, nothing happens. Output escaping already is disabled.
Therefore this
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" />
<xsl:output cdata-section-elements="country" />
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>
<xsl:template match="country/text()">
<xsl:value-of select="." disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
will give you exactly what your input was.
That's why you cannot get rid of double escaping and have CDATA
during the same transformation. You could use a two-step approach (1st step disables output escaping, 2nd step adds back CDATA) if you positively must have CDATA in the result document — but personally I think it's not worth it.
This is another solution, use CDATA inside an xsl:text with disable-output-escaping="yes":
<xsl:template match="/" >
<xsl:text disable-output-escaping="yes"><![CDATA[
<script>
var thisTextIsNotEscaped = "<b>this text is normally escaped, but not in this case</b>";
</script>
]]>
</xsl:text>
</xsl:template>
来源:https://stackoverflow.com/questions/20489560/xsl-unescape-html-inside-cdata