keep only white-listed elements and/or attributes

馋奶兔 提交于 2020-12-13 05:43:31

问题


I have an XML file with a plethora of nodes, each having a vast amount of attributes. For simplicity, let us assume the XML looking like this:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <header />
  <group>
     <node1 attr1="x" attr2="y" attr3="z" />
     <node2 attr4="x" attr5="y" attr6="z" />
     <node3 attr7="x" attr8="y" attr9="z" />
     <node1 attr1="x" attr2="y" attr3="z" />
  </group>
</root>

I would like to reduce this XML into a smaller version by reducing the content of /root/group/ by eliminating both attributes as well as nodes.

  • all nodes with name node3 should be removed
  • The nodes with name node1 should only have attribute attr1
  • The nodes with name node2 should only have attributes attr5 and attr6

I could write a simple XSLT for this by making use of simple if-match-do-nothing, eg.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
   <xsl:output method="xml" encoding="UTF-8" indent="yes" />

   <xsl:template match="/root/group/node3" />
   <xsl:template match="/root/group/node1/@attr2" />
   <xsl:template match="/root/group/node1/@attr3" />
   <xsl:template match="/root/group/node2/@attr4" />

   <xsl:template match="@*|node()">
     <xsl:copy>
       <xsl:apply-templates select="@*|node()"/>
     </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

This, however, does not fit my needs. The above states what I do not want, but I would like to state what I do want by making use of a whitelist Two questions I found answered this question partially. One question introduced the whitelist for the nodes, the other question introduced the whitelist for the attributes. How can I do this elegantly in a single whitelist or is there a better method? Can this be done in a whitelist of the form:

<whitelist>
  <node1 attr1="" />
  <node2 attr5="" attr6="" />
</whitelist>

Remark: I can only use XSLT-1.0

Expected output:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <header />
  <group>
     <node1 attr1="x" />
     <node2 attr5="y" attr6="z" />
     <node1 attr1="x" />
  </group>
</root>

relevant questions:

  • XSLT - How to keep only wanted elements from XML
  • XSL : Copy Attributes That Match A Whitelist

回答1:


Would this do it for you? Have a single template that match children of the group elements, and then check the white list document to see whether to copy that node, and if so, what attributes should be copied too

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ns="ns" version="1.0">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />

    <ns:WhiteList>
        <node>
            <name>node1</name>
            <attr>attr1</attr>
        </node>
        <node>
            <name>node2</name>
            <attr>attr5</attr>
            <attr>attr6</attr>
        </node>
    </ns:WhiteList>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="group/*">
        <xsl:variable name="node" select="document('')//ns:WhiteList/node[name = name(current())]" />
        <xsl:if test="$node">
            <xsl:copy>
                <xsl:apply-templates select="@*[name() = $node/attr]|node()" />
            </xsl:copy>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>



回答2:


The simple way is to make your stylesheet itself be the "whitelist":

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="group">
    <xsl:copy>
        <xsl:apply-templates select="node1 | node2"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="node1">
    <xsl:copy>
        <xsl:apply-templates select="@attr1"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="node2">
    <xsl:copy>
        <xsl:apply-templates select="@attr5 | @attr6"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Otherwise it can get pretty complicated:

  • It's relatively easy to test if a node appears in the given whitelist by its name (as they do on the other questions you linked to);

  • It is not so easy - esp. in XSLT 1.0 - to see if the node appears at the same position in the tree's hierarchy (i.e. that the path to it is the same as the path to a node in the whitelist).

If it's sufficient to test by name only, then you could do something like:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="http://example.com/my"
exclude-result-prefixes="my">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<my:whitelist>
    <root>
        <header/>
        <group>
            <node1 attr1=""/>
            <node2 attr5="" attr6=""/>
        </group>
    </root>
</my:whitelist>

<xsl:variable name="whitelist" select="document('')/xsl:stylesheet/my:whitelist"/>

<xsl:template match="*">
    <xsl:if test="$whitelist//*[name() = name(current())]">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:if>
</xsl:template>

<xsl:template match="@*">
    <xsl:if test="$whitelist//@*[name() = name(current())]">
        <xsl:copy/>
    </xsl:if>
</xsl:template>

</xsl:stylesheet>

But then of course you could simplify the structure of the whitelist, since it's completely ignored.


For an example of how this could be done with a whitelist consisting of paths, see: https://stackoverflow.com/a/30276667/3016153



来源:https://stackoverflow.com/questions/54420288/keep-only-white-listed-elements-and-or-attributes

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!