XML shredding via XSLT in Java

后端 未结 2 1592
不知归路
不知归路 2020-11-30 13:06

I need to transform large XML files that have a nested (hierarchical) structure of the form


   Flat XML
   Hierarchical XML (multiple blocks, so         


        
相关标签:
2条回答
  • 2020-11-30 13:35

    Here is a generic solution as requested:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
    
     <xsl:param name="pLeafNodes" select="//Level-4"/>
    
     <xsl:template match="/">
      <t>
        <xsl:call-template name="StructRepro"/>
      </t>
     </xsl:template>
    
     <xsl:template name="StructRepro">
       <xsl:param name="pLeaves" select="$pLeafNodes"/>
    
       <xsl:for-each select="$pLeaves">
         <xsl:apply-templates mode="build" select="/*">
          <xsl:with-param name="pChild" select="."/>
          <xsl:with-param name="pLeaves" select="$pLeaves"/>
         </xsl:apply-templates>
       </xsl:for-each>
     </xsl:template>
    
      <xsl:template mode="build" match="node()|@*">
          <xsl:param name="pChild"/>
          <xsl:param name="pLeaves"/>
    
         <xsl:copy>
           <xsl:apply-templates mode="build" select="@*"/>
    
           <xsl:variable name="vLeafChild" select=
             "*[count(.|$pChild) = count($pChild)]"/>
    
           <xsl:choose>
            <xsl:when test="$vLeafChild">
             <xsl:apply-templates mode="build"
                 select="$vLeafChild
                        |
                          node()[not(count(.|$pLeaves) = count($pLeaves))]">
                 <xsl:with-param name="pChild" select="$pChild"/>
                 <xsl:with-param name="pLeaves" select="$pLeaves"/>
             </xsl:apply-templates>
            </xsl:when>
            <xsl:otherwise>
             <xsl:apply-templates mode="build" select=
             "node()[not(.//*[count(.|$pLeaves) = count($pLeaves)])
                    or
                     .//*[count(.|$pChild) = count($pChild)]
                    ]
             ">
    
                 <xsl:with-param name="pChild" select="$pChild"/>
                 <xsl:with-param name="pLeaves" select="$pLeaves"/>
             </xsl:apply-templates>
            </xsl:otherwise>
           </xsl:choose>
         </xsl:copy>
     </xsl:template>
     <xsl:template match="text()"/>
    </xsl:stylesheet>
    

    When applied on the provided simplified (and generic) XML document:

    <Level-1>
       ...
       <Level-2>
          ...
          <Level-3>
            ...
            <Level-4>A</Level-4>
            <Level-4>B</Level-4>
            ...
          </Level-3>
          ...
       </Level-2>
       ...
    </Level-1>
    

    the wanted, correct result is produced:

    <Level-1>
       ...
       <Level-2>
          ...
          <Level-3>
             <Level-4>A</Level-4>
          </Level-3>
          ...
       </Level-2>
       ...
    </Level-1>
    <Level-1>
       ...
       <Level-2>
          ...
          <Level-3>
             <Level-4>B</Level-4>
          </Level-3>
          ...
       </Level-2>
       ...
    </Level-1>
    

    Now, if we change the line:

     <xsl:param name="pLeafNodes" select="//Level-4"/>
    

    to:

     <xsl:param name="pLeafNodes" select="//Job"/>
    

    and apply the transformation to the Employee XML document:

    <Employee name="A Name">
        <Address>123 A Street</Address>
        <Age>28</Age>
        <EmploymentHistory>
            <Employment country="US">
                <Comment>List of previous jobs in the US</Comment>
                <Jobs>3</Jobs>
                <JobDetails>
                    <Job title = "Senior Developer">
                        <StartDate>01/10/2001</StartDate>
                        <Months>38</Months>
                    </Job>
                    <Job title = "Senior Developer">
                        <StartDate>01/12/2004</StartDate>
                        <Months>6</Months>
                    </Job>
                    <Job title = "Senior Developer">
                        <StartDate>01/06/2005</StartDate>
                        <Months>10</Months>
                    </Job>
                </JobDetails>
            </Employment>
        </EmploymentHistory>
        <EmploymentHistory>
            <Employment country="UK">
                <Comment>List of previous jobs in the UK</Comment>
                <Jobs>2</Jobs>
                <JobDetails>
                    <Job title = "Junior Developer">
                        <StartDate>01/05/1999</StartDate>
                        <Months>25</Months>
                    </Job>
                    <Job title = "Junior Developer">
                        <StartDate>01/07/2001</StartDate>
                        <Months>3</Months>
                    </Job>
                </JobDetails>
            </Employment>
        </EmploymentHistory>
        <Available>true</Available>
        <Experience unit="years">6</Experience>
    </Employee>
    

    we again get the wanted, correct result:

    <t>
       <Employee name="A Name">
          <Address>123 A Street</Address>
          <Age>28</Age>
          <EmploymentHistory>
             <Employment country="US">
                <Comment>List of previous jobs in the US</Comment>
                <Jobs>3</Jobs>
                <JobDetails>
                   <Job title="Senior Developer">
                      <StartDate>01/10/2001</StartDate>
                      <Months>38</Months>
                   </Job>
                </JobDetails>
             </Employment>
          </EmploymentHistory>
          <Available>true</Available>
          <Experience unit="years">6</Experience>
       </Employee>
       <Employee name="A Name">
          <Address>123 A Street</Address>
          <Age>28</Age>
          <EmploymentHistory>
             <Employment country="US">
                <Comment>List of previous jobs in the US</Comment>
                <Jobs>3</Jobs>
                <JobDetails>
                   <Job title="Senior Developer">
                      <StartDate>01/12/2004</StartDate>
                      <Months>6</Months>
                   </Job>
                </JobDetails>
             </Employment>
          </EmploymentHistory>
          <Available>true</Available>
          <Experience unit="years">6</Experience>
       </Employee>
       <Employee name="A Name">
          <Address>123 A Street</Address>
          <Age>28</Age>
          <EmploymentHistory>
             <Employment country="US">
                <Comment>List of previous jobs in the US</Comment>
                <Jobs>3</Jobs>
                <JobDetails>
                   <Job title="Senior Developer">
                      <StartDate>01/06/2005</StartDate>
                      <Months>10</Months>
                   </Job>
                </JobDetails>
             </Employment>
          </EmploymentHistory>
          <Available>true</Available>
          <Experience unit="years">6</Experience>
       </Employee>
       <Employee name="A Name">
          <Address>123 A Street</Address>
          <Age>28</Age>
          <EmploymentHistory>
             <Employment country="UK">
                <Comment>List of previous jobs in the UK</Comment>
                <Jobs>2</Jobs>
                <JobDetails>
                   <Job title="Junior Developer">
                      <StartDate>01/05/1999</StartDate>
                      <Months>25</Months>
                   </Job>
                </JobDetails>
             </Employment>
          </EmploymentHistory>
          <Available>true</Available>
          <Experience unit="years">6</Experience>
       </Employee>
       <Employee name="A Name">
          <Address>123 A Street</Address>
          <Age>28</Age>
          <EmploymentHistory>
             <Employment country="UK">
                <Comment>List of previous jobs in the UK</Comment>
                <Jobs>2</Jobs>
                <JobDetails>
                   <Job title="Junior Developer">
                      <StartDate>01/07/2001</StartDate>
                      <Months>3</Months>
                   </Job>
                </JobDetails>
             </Employment>
          </EmploymentHistory>
          <Available>true</Available>
          <Experience unit="years">6</Experience>
       </Employee>
    </t>
    

    Explanation: The processing is done in a named template (StructRepro) and controlled by a single external parameter named pLeafNodes, that must contain a nodeset of all nodes whose "upward structure" is to be reproduced in the result.

    0 讨论(0)
  • 2020-11-30 13:40

    Given the following XML:

    <?xml version="1.0" encoding="utf-8" ?>
    <Employee name="A Name">
      <Address>123 A Street</Address>
      <Age>28</Age>
      <EmploymentHistory>
        <Employment country="US">
          <Comment>List of previous jobs in the US</Comment>
          <Jobs>3</Jobs>
          <JobDetails>
            <Job title = "Developer">
              <StartDate>01/10/2001</StartDate>
              <Months>38</Months>
            </Job>
            <Job title = "Developer">
              <StartDate>01/12/2004</StartDate>
              <Months>6</Months>
            </Job>
            <Job title = "Developer">
              <StartDate>01/06/2005</StartDate>
              <Months>10</Months>
            </Job>
          </JobDetails>
          </Employment>
          <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
              <Job title = "Developer">
                <StartDate>01/05/1999</StartDate>
                <Months>25</Months>
              </Job>
              <Job title = "Developer">
                <StartDate>01/07/2001</StartDate>
                <Months>3</Months>
              </Job>
            </JobDetails>
            </Employment>
      </EmploymentHistory>
      <Available>true</Available>
      <Experience unit="years">6</Experience>
    </Employee>
    

    The following XSLT:

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
    
        <xsl:output method="xml" indent="yes"/>
    
        <xsl:template match="/">
          <Output>
            <xsl:apply-templates select="//Employee/EmploymentHistory/Employment/JobDetails/Job" />
          </Output>
        </xsl:template>
    
      <xsl:template match="//Employee/EmploymentHistory/Employment/JobDetails/Job">
        <Employee>
          <xsl:attribute name="name">
            <xsl:value-of select="ancestor::Employee/@name"/>
          </xsl:attribute>
          <Address>
            <xsl:value-of select="ancestor::Employee/Address"/>
          </Address>
          <Age>
            <xsl:value-of select="ancestor::Employee/Age"/>
          </Age>
          <EmploymentHistory>
            <Employment>
              <xsl:attribute name="country">
                <xsl:value-of select="ancestor::Employment/@country"/>
              </xsl:attribute>
              <Comment>
                <xsl:value-of select="ancestor::Employment/Comment"/>
              </Comment>
              <Jobs>
                <xsl:value-of select="ancestor::Employment/Jobs"/>
              </Jobs>
              <JobDetails>
                <xsl:copy-of select="."/>
              </JobDetails>
              <Available>
                <xsl:value-of select="ancestor::Employee/Available"/>
              </Available>
              <Experience>
                <xsl:attribute name="unit">
                  <xsl:value-of select="ancestor::Employee/Experience/@unit"/>
                </xsl:attribute>
                <xsl:value-of select="ancestor::Employee/Experience"/>
              </Experience>
            </Employment>
          </EmploymentHistory>
        </Employee>
    
      </xsl:template>
    
    
    </xsl:stylesheet>
    

    Gives the following output:

    <?xml version="1.0" encoding="utf-8"?>
    <Output>
      <Employee name="A Name">
        <Address>123 A Street</Address>
        <Age>28</Age>
        <EmploymentHistory>
          <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
              <Job title="Developer">
              <StartDate>01/10/2001</StartDate>
              <Months>38</Months>
            </Job>
            </JobDetails>
            <Available>true</Available>
            <Experience unit="years">6</Experience>
          </Employment>
        </EmploymentHistory>
      </Employee>
      <Employee name="A Name">
        <Address>123 A Street</Address>
        <Age>28</Age>
        <EmploymentHistory>
          <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
              <Job title="Developer">
              <StartDate>01/12/2004</StartDate>
              <Months>6</Months>
            </Job>
            </JobDetails>
            <Available>true</Available>
            <Experience unit="years">6</Experience>
          </Employment>
        </EmploymentHistory>
      </Employee>
      <Employee name="A Name">
        <Address>123 A Street</Address>
        <Age>28</Age>
        <EmploymentHistory>
          <Employment country="US">
            <Comment>List of previous jobs in the US</Comment>
            <Jobs>3</Jobs>
            <JobDetails>
              <Job title="Developer">
              <StartDate>01/06/2005</StartDate>
              <Months>10</Months>
            </Job>
            </JobDetails>
            <Available>true</Available>
            <Experience unit="years">6</Experience>
          </Employment>
        </EmploymentHistory>
      </Employee>
      <Employee name="A Name">
        <Address>123 A Street</Address>
        <Age>28</Age>
        <EmploymentHistory>
          <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
              <Job title="Developer">
                <StartDate>01/05/1999</StartDate>
                <Months>25</Months>
              </Job>
            </JobDetails>
            <Available>true</Available>
            <Experience unit="years">6</Experience>
          </Employment>
        </EmploymentHistory>
      </Employee>
      <Employee name="A Name">
        <Address>123 A Street</Address>
        <Age>28</Age>
        <EmploymentHistory>
          <Employment country="UK">
            <Comment>List of previous jobs in the UK</Comment>
            <Jobs>2</Jobs>
            <JobDetails>
              <Job title="Developer">
                <StartDate>01/07/2001</StartDate>
                <Months>3</Months>
              </Job>
            </JobDetails>
            <Available>true</Available>
            <Experience unit="years">6</Experience>
          </Employment>
        </EmploymentHistory>
      </Employee>
    </Output>
    

    Note that I've added an Output root element to ensure the document is well formed.

    Is this what you wanted?

    You might also be able to use xsl:copy to copy the higher level elements, but I need to think about this one a bit more. With the above xslt, you have more control, but also you have to redefine your elements...

    0 讨论(0)
提交回复
热议问题