XSLT 3.0 Streaming with Grouping and Sum/Accumulator

≯℡__Kan透↙ 提交于 2019-12-10 21:16:11

问题


I'm trying to figure out how to use XSLT Streaming (to reduce memory usage) in a scenario that requires grouping (with an arbitrary number of groups) and summing the group. So far I haven't been able to find any examples. Here's an example XML

<?xml version='1.0' encoding='UTF-8'?>
  <Data>
    <Entry>
      <Genre>Fantasy</Genre>
      <Condition>New</Condition>
      <Format>Hardback</Format>
      <Title>Birds</Title>
      <Count>3</Count>
    </Entry>
    <Entry>
      <Genre>Fantasy</Genre>
      <Condition>New</Condition>
      <Format>Hardback</Format>
      <Title>Cats</Title>
      <Count>2</Count>
    </Entry>
    <Entry>
      <Genre>Non-Fiction</Genre>
      <Condition>New</Condition>
      <Format>Paperback</Format>
      <Title>Dogs</Title>
      <Count>4</Count>
    </Entry>
 </Data>

In XSLT 2.0 I would use this to group by Genre, Condition and Format and Sum the counts.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" indent="yes" />
  <xsl:template match="/">
     <xsl:call-template name="body"/>
  </xsl:template>
  <xsl:template name="body">
    <xsl:for-each-group select="Data/Entry" group-by="concat(Genre,Condition,Format)">
      <xsl:value-of select="Genre"/>
      <xsl:value-of select="Condition"/>
      <xsl:value-of select="Format"/>
      <xsl:value-of select="sum(current-group()/Count)"/>
    </xsl:for-each-group>
  </xsl:template>
</xsl:stylesheet>

For output I would get two lines, a sum of 5 for Fantasy, New, Hardback and a sum of 4 for Non-Fiction, New, Paperback.

Obviously this won't work with Streaming because the sum accesses the whole group. I think I need to iterate through the document twice. The first time I could build a map of the groups (creating a new group if one doesn't exist yet). The second time The problem is I also need an accumulator for each group with a rule that matches the group, and it doesn't seem you can create dynamic accumulators.

Is there a way to create accumulators on the fly? Is there another/easier way to do this with Streaming?


回答1:


To be able to use streamed grouping with XSLT 3.0 one option that I see is to first transform the element based data you have into attribute based data using a stylesheet like

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:mode streamable="yes" on-no-match="shallow-copy"/>

    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="Entry/*">
        <xsl:attribute name="{name()}" namespace="{namespace-uri()}" select="."/>
    </xsl:template>

</xsl:stylesheet>

then you can perfectly used streamed grouping (as far as a streamed group-by is possible at all, as far as I understand there will be some buffering necessary) as follows:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:mode streamable="yes"/>

    <xsl:output method="text"/>

    <xsl:template match="/">
        <xsl:fork>
            <xsl:for-each-group select="Data/Entry" composite="yes" group-by="@Genre, @Condition, @Format">
                <xsl:value-of select="current-grouping-key(), sum(current-group()/@Count)"/>
                <xsl:text>&#10;</xsl:text>
            </xsl:for-each-group>
        </xsl:fork>
    </xsl:template>

</xsl:stylesheet>

I don't know whether first creating an attribute centric document is an option but I think it is better to share suggestions with code in an answer instead of trying to put them into a comment. And the answer in XSLT Streaming Chained Transform shows how to use Saxon 9 with Java or Scala to chain two streaming transformations without the need to write a temporary output file for the first transformation step.

As for doing it with copy-of on the original input format, Saxon 9.7 EE assesses the following as streamable and executes it with the right result:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:mode streamable="yes"/>

    <xsl:output method="text"/>

    <xsl:template match="/">
        <xsl:for-each-group select="copy-of(Data/Entry)" composite="yes"
            group-by="Genre, Condition, Format">
            <xsl:value-of select="current-grouping-key(), sum(current-group()/Count)"/>
            <xsl:text>&#10;</xsl:text>
        </xsl:for-each-group>
    </xsl:template>

</xsl:stylesheet>

I am not sure it consumes less memory however than normal, tree based grouping. Perhaps you can measure with your real input data.

As a third alternative, to use a map as you seemed to want to do, here is an xsl:iterate example that iterates through the Entry elements, collecting the accumulated Count value in a map:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map" exclude-result-prefixes="xs math map"
    version="3.0">

    <xsl:mode streamable="yes"/>

    <xsl:output method="text"/>

    <xsl:template match="/">
        <xsl:iterate select="Data/Entry">
            <xsl:param name="groups" as="map(xs:string, xs:integer)" select="map{}"/>
            <xsl:on-completion>
                <xsl:value-of select="map:keys($groups)!(. || ' ' || $groups(.))" separator="&#10;"/>
            </xsl:on-completion>
            <xsl:variable name="current-entry" select="copy-of()"/>
            <xsl:variable name="key"
                select="string-join($current-entry/(Genre, Condition, Format), '|')"/>
            <xsl:next-iteration>
                <xsl:with-param name="groups"
                    select="
                        if (map:contains($groups, $key)) then
                            map:put($groups, $key, map:get($groups, $key) + xs:integer($current-entry/Count))
                        else
                            map:put($groups, $key, xs:integer($current-entry/Count))"
                />
            </xsl:next-iteration>
        </xsl:iterate>
    </xsl:template>

</xsl:stylesheet>


来源:https://stackoverflow.com/questions/44287959/xslt-3-0-streaming-with-grouping-and-sum-accumulator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!