How can my XSLT filter avoid leaving blank lines in output XML when deleting elements, without changing indentation otherwise?

和自甴很熟 提交于 2019-12-10 10:39:22

问题


I am writing an XSLT filter which reads an XML file, and generates a shorter XML file with some selected elements (and all their children) removed.

So far, my filter gives me output which is valid, well-formed XML, but it has blank lines where the removed elements used to be. Formally, I think the text node before the removed element remains, causing the blank line. I would like to remove this blank line, but leave all other indentation as-is. How can I do this?

A simplified version of my XSLT filter is:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes" encoding="utf-8" />

    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*" />
        </xsl:copy>
    </xsl:template>

<xsl:template match="root/maybe[remove]" />

</xsl:stylesheet>

A very simplified version of my input XML file is:

<?xml version="1.0" encoding="utf-8" ?>
<root>
      <maybe><keep /></maybe>
   <maybe><remove/></maybe>
</root>

Yes, the indentation is non-standard. I'm trying to make the point that I want the filter to leave the indentation it finds, except for the elements it removes. This lets me confirm the result using conventional diff.

The output I get now (using xsltproc from libxslt, on MacOS X 10.10):

<?xml version="1.0" encoding="utf-8"?>
<root>
       <maybe><keep/></maybe>

</root>

The blank line between <keep/> and </root> is what I'm trying to eliminate.

Now, elsewhere on SO, related questions XSLT: how to prevent the XSLT code from generating redundant blank-space in output xml and Removing blank lines in XSLT suggest adding xsl:strip-space to the XSLT filter:

<xsl:strip-space elements="*"/>

When I try that, the output file no longer has the blank line, but it now has different indentation than the original:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <maybe>
    <keep/>
  </maybe>
</root>

(Note that <maybe> and <keep/> and </maybe> are now on separate lines, and indented differently.)

So, is there an XSLT element which will remove the blank line along with the element, but leave the other indentation and line breaks intact?

Also, my real files are from GnuCash and have a much more complex structure. My real XSLT filter has a more complex match expression. Thus, an XSLT element which doesn't require me to repeat the match expression is preferred.

<xsl:template match='gnc-v2//gnc:account[@version="2.0.0"]/act:slots/
        slot[slot:key/text()="import-map-bayes"]/slot:value[@type="frame"]/
        slot/slot:value[@type="frame"]/slot[starts-with(slot:key/text(),
            "Assets, Business, CAD:"
    )]' />

Also, a related question Removing extra blank lines with XSLT, without using indentation got no answers. No insight there.

I'm using XSLT 1.0 because that's what my tool supports. Does XSLT 2.0 provide a better answer for this question?

Update: simplified match patterns slightly, mentioned XSLT 1 vs 2.


回答1:


Just add this template:

  <xsl:template match="text()[following-sibling::node()[1][self::maybe[remove]]]" />

The complete stylesheet becomes:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml" omit-xml-declaration="yes"/>

  <xsl:template match="node() | @*">
     <xsl:copy>
       <xsl:apply-templates select="node() | @*" />
     </xsl:copy>
  </xsl:template>

  <xsl:template match="root/maybe[remove]" />
  <xsl:template match="text()[following-sibling::node()[1][self::maybe[remove]]]" />
</xsl:stylesheet>

Do note: I have removed the indent="yes" attribute, because this surely messes up (normalizes the indentation of) the output.

When applied on the provided XML document:

<root>
      <maybe><keep /></maybe>
   <maybe><remove/></maybe>
</root>

the wanted result is produced:

<root>
      <maybe><keep/></maybe>
</root>

In case you would also want to remove adjacent preceding comments and/or processing instructions, then the template to add is:

      <xsl:template match=
          "node()[not(self::*)][following-sibling::*[1][self::maybe[remove]]]" />



回答2:


If you really want to use variables in patterns then I think you need to move to XSLT 3.0 as currently supported by EXSELT or by the commercial editions of the Saxon 9.6 or 9.7.

With EXSLT I have tried the following using variables and keys:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">

    <xsl:key name="version" match="foo" use="@version"/>
    <xsl:key name="item" match="foo/bar/item" use="@key"/>

    <xsl:variable name="vers2" select="key('version', '2.0.0')"/>

    <xsl:variable name="k1" select="key('item', 'k1', $vers2)"/>

    <xsl:variable name="data1" select="$k1/data[starts-with(., 'abc')]"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="$data1| 
                         $k1/text()[not(normalize-space())][some $d in $data1 satisfies ($d is following-sibling::node()[1])]"/>

</xsl:transform>

It transforms an input sample of the form

<root>
  <foo version="2.0.0">
    <bar>
      <item key="k1">
        <data>abcdefg</data>
        <data>1234567</data>
      </item>
      <item key="k1">
        <data>1234567</data>
        <data>abcdefg</data>
      </item>
      <item key="k2">
        <data>1234567</data>
        <data>abcdefg</data>
      </item>
      <item key="k1">
        <data>foo</data>
        <data>abcdefg</data>
        <data>abcjjjj</data>
        <data>bar</data>
        <data>abcllll</data>
      </item>
    </bar>
  </foo>
  <foo version="1.0.0">
    <bar>
      <item key="k1">
        <data>abcdefg</data>
        <data>1234567</data>
      </item>
      <item key="k1">
        <data>1234567</data>
        <data>abcdefg</data>
      </item>
      <item key="k2">
        <data>1234567</data>
        <data>abcdefg</data>
      </item>
    </bar>
  </foo>
</root>

into

<root>
  <foo version="2.0.0">
    <bar>
      <item key="k1">
        <data>1234567</data>
      </item>
      <item key="k1">
        <data>1234567</data>
      </item>
      <item key="k2">
        <data>1234567</data>
        <data>abcdefg</data>
      </item>
      <item key="k1">
        <data>foo</data>
        <data>bar</data>
      </item>
    </bar>
  </foo>
  <foo version="1.0.0">
    <bar>
      <item key="k1">
        <data>abcdefg</data>
        <data>1234567</data>
      </item>
      <item key="k1">
        <data>1234567</data>
        <data>abcdefg</data>
      </item>
      <item key="k2">
        <data>1234567</data>
        <data>abcdefg</data>
      </item>
    </bar>
  </foo>
</root>

The commercial editions of Saxon 9.6/9.7 (EE and PE) also run the above code and produce the same result as Exselt.

As for using XSLT 3.0 with your real samples, as they seem to have elements in a namespace, the use of xpath-default-namespace can simplify that in XSLT 2.0 or 3.0 to have short match patters.




回答3:


This XSLT filter gives the desired result:

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes" encoding="utf-8" />

    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="root/maybe[./remove]" />
    <xsl:template match="root/text()[following-sibling::maybe[1]/remove]" />

</xsl:stylesheet>

Result:

<?xml version="1.0" encoding="utf-8"?>
<root>
       <maybe><keep/></maybe>
</root>

However, this approach gets ugly pretty fast as the match expression gets long and complicated. The two template elements have a lot of redundancy in the match patterns. This redundancy can't be helped, however. We can't put the common part in a variable. The XSLT 1.0 spec says, "It is an error for the value of the match attribute to contain a VariableReference."

Surely someone else can do better?



来源:https://stackoverflow.com/questions/35407361/how-can-my-xslt-filter-avoid-leaving-blank-lines-in-output-xml-when-deleting-ele

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!