xslt 2.0 tokenize and group

时光怂恿深爱的人放手 提交于 2019-12-18 09:55:38

问题


I have an text file with following data:

<t>Heros
Firstname Sean
Lastname Connery
DOB 25-08-1930

Films
Dr.No 1962
Goldfinger 1964
Thunerball 1965

Award
name Academy
time 1

Award
name BAFTA
time 2

Award
name Gloden Globes
time 3</t>

Expected output should look like:

<Jamesfilms>
    <heros>
        <firstName>Sean</firstName>
        <lastName>Connery</lastName>
        <DOB>25-08-1930</DOB>
    </heros>
    <films>
        <Dr.No>1962</Dr.No>
        <Goldfinger>1964</Goldfinger>
        <Thunerball>1965</Thunerball>
    </films>
    <award>
        <name>Academy</name>
        <times>1</times>
    </award>
    <award>
        <name>BAFTA</name>
        <times>2</times>
    </award>
    <award>
        <name>Gloden Globes</name>
        <times>3</times>
    </award>
</Jamesfilms>

the text file content are space separator key value pairs, how to divide key values and generate XML node?

EDIT: I have tried Daniel Haley answer, and trying to resolve below exception:

Error at xsl:for-each on line 10 of transformer.xslt:
  XTDE1170: Invalid relative URI: Illegal character in path at index 5: 

Java class:

    final String TXT_PATH = "E:/tmp/test/input.txt";
    final String XSLT_PATH = "E:/tmp/test/txtToXml.xslt";
    final String XML_PATH = "E:/tmp/test/test_xml_result.xml";

    TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
    Transformer transformer = tFactory.newTransformer(new StreamSource(new File(XSLT_PATH)));
    transformer.transform(new StreamSource(new File(TXT_PATH)),new StreamResult(new File(XML_PATH)));

and modified xslt:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>

  <xsl:variable name="initData" as="node()">
    <Jamesfilms>
      <xsl:for-each select="tokenize(unparsed-text(., $input-encoding),'\r?\n\r?\n')">
        <xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
        <xsl:choose>
          <xsl:when test="$tokens[1] castable as xs:QName">
            <xsl:element name="{$tokens[1]}">
              <xsl:for-each select="$tokens[position() > 1]">
                <xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
                <xsl:choose>
                  <xsl:when test="$tokens2[1] castable as xs:QName">
                    <xsl:element name="{$tokens2[1]}">
                      <xsl:value-of select="$tokens2[position()>1]" separator=" "/>
                    </xsl:element>                      
                  </xsl:when>
                  <xsl:otherwise>
                    <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
                  </xsl:otherwise>
                </xsl:choose>
              </xsl:for-each>
            </xsl:element>            
          </xsl:when>
          <xsl:otherwise>
            <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </Jamesfilms>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <xsl:apply-templates select="$initData"/>    
  </xsl:template>

  <!--Add additional templates to do further transforming of the initial data ($initData).-->

</xsl:stylesheet>

回答1:


You shouldn't need to group; you could just tokenize (and tokenize and tokenize...).

Here's an example. It doesn't do anything with the case of the element names. You can either handle those changes during the building of $initData, or you can add additional templates to handle any changes.

Also, the element names have to be valid QNames. Right now the stylesheet terminates processing with a message, but you can change how that's handled.

This should at least get you started...

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>
  <xsl:param name="input-uri" as="xs:string" select="'so.txt'"/>

  <xsl:variable name="initData" as="node()">
    <Jamesfilms>
      <xsl:for-each select="tokenize(unparsed-text($input-uri, $input-encoding),'\r?\n\r?\n')">
        <xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
        <xsl:choose>
          <xsl:when test="$tokens[1] castable as xs:QName">
            <xsl:element name="{$tokens[1]}">
              <xsl:for-each select="$tokens[position() > 1]">
                <xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
                <xsl:choose>
                  <xsl:when test="$tokens2[1] castable as xs:QName">
                    <xsl:element name="{$tokens2[1]}">
                      <xsl:value-of select="$tokens2[position()>1]" separator=" "/>
                    </xsl:element>                      
                  </xsl:when>
                  <xsl:otherwise>
                    <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
                  </xsl:otherwise>
                </xsl:choose>
              </xsl:for-each>
            </xsl:element>            
          </xsl:when>
          <xsl:otherwise>
            <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </Jamesfilms>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <xsl:apply-templates select="$initData"/>    
  </xsl:template>

  <!--Add additional templates to do further transforming of the initial data ($initData).-->

</xsl:stylesheet>

EDIT

You're passing the text file in as the input of the transform. That's why you had to add the <t> element.

Since you don't actually have an XML input, you can pass the stylesheet itself in as input. Nothing will get processed because we're only applying-templates to the variable in the template that matches root (/).

You also need to set the input-uri parameter with transformer.setParameter("input-uri", TXT_PATH);. If your path is absolute, be sure to add the file:/// protocol.

Example...

Text File

Heros
Firstname Sean
Lastname Connery
DOB 25-08-1930

Films
Dr.No 1962
Goldfinger 1964
Thunerball 1965

Award
name Academy
time 1

Award
name BAFTA
time 2

Award
name Gloden Globes
time 3

Java (you'll need to change paths/filenames)

final String TXT_PATH = "file:///C:/tmp/input.txt";
final String XSLT_PATH = "C:/tmp/txt2xml.xsl";
final String XML_PATH = "C:/tmp/test_xml_result.xml";

TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
Transformer transformer = tFactory.newTransformer(new StreamSource(new File(XSLT_PATH)));
transformer.setParameter("input-uri", TXT_PATH);
transformer.transform(new StreamSource(new File(XSLT_PATH)),new StreamResult(new File(XML_PATH)));

XSLT 2.0

Same as above.

Output

<Jamesfilms>
   <Heros>
      <Firstname>Sean</Firstname>
      <Lastname>Connery</Lastname>
      <DOB>25-08-1930</DOB>
   </Heros>
   <Films>
      <Dr.No>1962</Dr.No>
      <Goldfinger>1964</Goldfinger>
      <Thunerball>1965</Thunerball>
   </Films>
   <Award>
      <name>Academy</name>
      <time>1</time>
   </Award>
   <Award>
      <name>BAFTA</name>
      <time>2</time>
   </Award>
   <Award>
      <name>Gloden Globes</name>
      <time>3</time>
   </Award>
</Jamesfilms>

However, since you're using Saxon you could use the s9api and specify an initial template. This is the way I would do it instead of passing the stylesheet as the input to the transform.

Example...

Java

final String TXT_PATH = "file:///C:/tmp/input.txt";
final String XSLT_PATH = "C:/tmp/txt2xml.xsl";
final String XML_PATH = "C:/tmp/test_xml_result.xml";

Processor processor = new Processor(false);
Serializer serializer = processor.newSerializer();
serializer.setOutputFile(new File(XML_PATH));
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable executable = compiler.compile(new StreamSource(new File(XSLT_PATH)));
XsltTransformer transformer = executable.load();
transformer.setInitialTemplate(new QName("root"));
transformer.setParameter(new QName("input-uri"), new XdmAtomicValue(TXT_PATH));
transformer.setDestination(serializer);
transformer.transform();

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>
  <xsl:param name="input-uri" as="xs:string"/>

  <xsl:variable name="initData" as="node()">
    <Jamesfilms>
      <xsl:for-each select="tokenize(unparsed-text($input-uri, $input-encoding),'\r?\n\r?\n')">
        <xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
        <xsl:choose>
          <xsl:when test="$tokens[1] castable as xs:QName">
            <xsl:element name="{replace($tokens[1],'\s','')}">
              <xsl:for-each select="$tokens[position() > 1]">
                <xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
                <xsl:choose>
                  <xsl:when test="$tokens2[1] castable as xs:QName">
                    <xsl:element name="{$tokens2[1]}">
                      <xsl:value-of select="$tokens2[position()>1]" separator=" "/>
                    </xsl:element>                      
                  </xsl:when>
                  <xsl:otherwise>
                    <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
                  </xsl:otherwise>
                </xsl:choose>
              </xsl:for-each>
            </xsl:element>            
          </xsl:when>
          <xsl:otherwise>
            <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </Jamesfilms>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/" name="root">
    <xsl:apply-templates select="$initData"/>    
  </xsl:template>

  <!--Add additional templates to do further transforming of the initial data ($initData).-->

</xsl:stylesheet>

Input and output would be the same. Let me know if you need me to add the java imports to the example.



来源:https://stackoverflow.com/questions/33788966/xslt-2-0-tokenize-and-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!