问题
I have extracted some code from codeproject to reindent an XML document. Does anyone know how I can modify the stylesheet to make it so that the transform of an XML file will result in empty tags showing up as <tag />
instead of <tag></tag>
?
// http://www.codeproject.com/Articles/43309/How-to-create-a-simple-XML-file-using-MSXML-in-C
MSXML2::IXMLDOMDocumentPtr FormatDOMDocument(MSXML2::IXMLDOMDocumentPtr pDoc)
{
LPCSTR const static szStyleSheet =
R"!(<?xml version="1.0" encoding="utf-8"?>)!"
R"!(<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">)!"
R"!( <xsl:output method="xml" indent="yes"/>)!"
R"!( <xsl:template match="@* | node()">)!"
R"!( <xsl:copy>)!"
R"!( <xsl:apply-templates select="@* | node()"/>)!"
R"!( </xsl:copy>)!"
R"!( </xsl:template>)!"
R"!(</xsl:stylesheet>)!";
MSXML2::IXMLDOMDocumentPtr pXmlStyleSheet;
pXmlStyleSheet.CreateInstance(__uuidof(MSXML2::DOMDocument60));
pXmlStyleSheet->loadXML(szStyleSheet);
MSXML2::IXMLDOMDocumentPtr pXmlFormattedDoc;
pXmlFormattedDoc.CreateInstance(__uuidof(MSXML2::DOMDocument60));
CComPtr<IDispatch> pDispatch;
HRESULT hr = pXmlFormattedDoc->QueryInterface(IID_IDispatch, (void**)&pDispatch);
if (SUCCEEDED(hr))
{
_variant_t vtOutObject;
vtOutObject.vt = VT_DISPATCH;
vtOutObject.pdispVal = pDispatch;
vtOutObject.pdispVal->AddRef();
hr = pDoc->transformNodeToObject(pXmlStyleSheet, vtOutObject);
}
//By default it is writing the encoding = UTF-16. Let us change the encoding to UTF-8
// <?xml version="1.0" encoding="UTF-8"?>
MSXML2::IXMLDOMNodePtr pXMLFirstChild = pXmlFormattedDoc->GetfirstChild();
// A map of the a attributes (vesrsion, encoding) values (1.0, UTF-8) pair
MSXML2::IXMLDOMNamedNodeMapPtr pXMLAttributeMap = pXMLFirstChild->Getattributes();
MSXML2::IXMLDOMNodePtr pXMLEncodNode = pXMLAttributeMap->getNamedItem(_T("encoding"));
pXMLEncodNode->PutnodeValue(_T("UTF-8")); //encoding = UTF-8
return pXmlFormattedDoc;
}
回答1:
This stylesheet causes empty tags to be written where possible (with MSXML6):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(*) and not(normalize-space()) and not(comment()) and not(processing-instruction())]">
<xsl:element name="{name()}" namespace="{namespace-uri()}">
<xsl:copy-of select="./@*"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
This is achieved by avoiding the xsl:copy
for elements with no child elements, text, comments or processing-instructions, and "manually" copying the element using xsl:element
. Note that the attributes are copied too with the nested xsl:copy-of
.
For example, this XML document:
<Document>
<empty> </empty>
<empty-2/>
<non-empty>
Some text
</non-empty>
<non-empty-2 some-attribute="attribute text">
<empty-3/>
<non-empty-3><empty-4/><empty-with-attribute another-attribute="some more text">
</empty-with-attribute>
</non-empty-3>
</non-empty-2>
<abc:non-empty-with-namespace xmlns:abc="urn:test:abc">
<abc:empty-with-namespace abc:namespaced-attribute="namespaced attribute text"/>
</abc:non-empty-with-namespace>
<non-empty-comment>
<!-- A comment -->
</non-empty-comment>
<non-empty-proc-instr>
<?some-instruction?>
</non-empty-proc-instr>
</Document>
would be transformed into the following using your FormatDOMDocument
function, with the updated stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<empty/>
<empty-2/>
<non-empty>
Some text
</non-empty>
<non-empty-2 some-attribute="attribute text">
<empty-3/>
<non-empty-3>
<empty-4/>
<empty-with-attribute another-attribute="some more text"/>
</non-empty-3>
</non-empty-2>
<abc:non-empty-with-namespace xmlns:abc="urn:test:abc">
<abc:empty-with-namespace abc:namespaced-attribute="namespaced attribute text"/>
</abc:non-empty-with-namespace>
<non-empty-comment>
<!-- A comment -->
</non-empty-comment>
<non-empty-proc-instr>
<?some-instruction?>
</non-empty-proc-instr>
</Document>
To restrict empty tags to only certain elements by name, you can adjust the match
pattern to add a check on the element name: contains('|list|of|element|names|', concat('|',name(),'|'))
. Note that that list of names is separated with a |
, and there's a |
at the start and end of the list too, and we concatenate the element name with those delimiters as well. This trick enables us use contains
(which just matches any substring) to achieve the effect of searching in the list.
For example, allowing empty tags for the non-empty
, empty-2
, empty-4
and abc:empty-with-namespace
elements in my previous example, the updated stylesheet would be:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[contains('|non-empty|empty-2|empty-4|abc:empty-with-namespace|', concat('|',name(),'|')) and not(*) and not(normalize-space()) and not(comment()) and not(processing-instruction())]">
<xsl:element name="{name()}" namespace="{namespace-uri()}">
<xsl:copy-of select="./@*"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
and the output of FormatDOMDocument
would become:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<empty></empty>
<empty-2/>
<non-empty>
Some text
</non-empty>
<non-empty-2 some-attribute="attribute text">
<empty-3></empty-3>
<non-empty-3>
<empty-4/>
<empty-with-attribute another-attribute="some more text"></empty-with-attribute>
</non-empty-3>
</non-empty-2>
<abc:non-empty-with-namespace xmlns:abc="urn:test:abc">
<abc:empty-with-namespace abc:namespaced-attribute="namespaced attribute text"/>
</abc:non-empty-with-namespace>
<non-empty-comment>
<!-- A comment -->
</non-empty-comment>
<non-empty-proc-instr>
<?some-instruction?>
</non-empty-proc-instr>
</Document>
Note that though we specified non-empty
as a possible empty tag in that list, that it doesn't come out as empty, because it actually has a text node (which is what we want). Also, note that empty
wasn't in our list, and it comes out with a closing tag as <empty></empty>
which was what we wanted in this case too (similarly for empty-3
).
来源:https://stackoverflow.com/questions/29036336/is-there-a-way-to-modify-the-style-sheet-so-that-it-transforms-an-xml-document-w