问题
Where did we go wrong?
When I process this xml with xslt 2 on saxon he:
<data>
<grab>Grab me and print me back "</grab>
</data>
using this stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="/">
<xsl:apply-templates select="/data/grab"/>
</xsl:template>
<xsl:template match="/data/grab">
<node><xsl:value-of select="text()"/></node>
</xsl:template>
</xsl:stylesheet>
I get this output:
<?xml version="1.0" encoding="UTF-8"?><node>Grab me and print me back "</node>
But I want to retain the " in the outputted xml. Therefore we needed to add a character-map:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:character-map name="specialchar">
<xsl:output-character character=""" string="&quot;"/>
</xsl:character-map>
<xsl:output method="xml" indent="no" use-character-maps="specialchar"/>
<xsl:template match="/">
<xsl:apply-templates select="/data/grab"/>
</xsl:template>
<xsl:template match="/data/grab">
<node><xsl:value-of select="text()"/></node>
</xsl:template>
</xsl:stylesheet>
Which retains the " entity... which, imho, looks verbose and ugly,
Is this really necessary? Is there not a more elegant alternative? If not, what is the rationale behind this?
回答1:
Architecturally, XSLT transforms XDM trees to XDM trees, it does not transform lexical XML to lexical XML. XDM trees do not distinguish between "
and "
, any more than they distinguish between <a id="5"/>
and <a id = '5'></a>
. The fact that arbitrary and irrelevant differences in the way you write the XML are hidden from the XSLT programmer is very much by design, and makes it much easier to write correct transformations.
Now there are certainly use cases for preserving entity references: particularly semantic entity references like &author;
that might take different values on different occasions. But entity references aren't a particularly good solution to that requirement; XInclude is usually better. And the argument doesn't apply to character references like "
: it's really hard to see a good use case for treating "
and "
differently, and you certainly haven't provided one.
At a practical level, Saxon couldn't preserve the "
even if it wanted to, because it doesn't know it's there: the XML parser (which converts lexical XML to XDM) doesn't notify character references to the application. Again, that's by design: the theory is that applications shouldn't know and shouldn't care. And it has the great virtue that we don't get zillions of SO questions from application developers who failed to cater for the possibility.
来源:https://stackoverflow.com/questions/62286935/retaining-entity-in-xslt-stylesheet-output-without-using-character-map