xslt 2.0 tokenize and group

后端 未结 1 954
别跟我提以往
别跟我提以往 2020-12-22 13:18

I have an text file with following data:

Heros
Firstname Sean
Lastname Connery
DOB 25-08-1930

Films
Dr.No 1962
Goldfinger 1964
Thunerball 1965

Awa         


        
相关标签:
1条回答
  • 2020-12-22 13:25

    You shouldn't need to group; you could just tokenize (and tokenize and tokenize...).

    Here's an example. It doesn't do anything with the case of the element names. You can either handle those changes during the building of $initData, or you can add additional templates to handle any changes.

    Also, the element names have to be valid QNames. Right now the stylesheet terminates processing with a message, but you can change how that's handled.

    This should at least get you started...

    XSLT 2.0

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
      <xsl:output indent="yes"/>
      <xsl:strip-space elements="*"/>
    
      <xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>
      <xsl:param name="input-uri" as="xs:string" select="'so.txt'"/>
    
      <xsl:variable name="initData" as="node()">
        <Jamesfilms>
          <xsl:for-each select="tokenize(unparsed-text($input-uri, $input-encoding),'\r?\n\r?\n')">
            <xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
            <xsl:choose>
              <xsl:when test="$tokens[1] castable as xs:QName">
                <xsl:element name="{$tokens[1]}">
                  <xsl:for-each select="$tokens[position() > 1]">
                    <xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
                    <xsl:choose>
                      <xsl:when test="$tokens2[1] castable as xs:QName">
                        <xsl:element name="{$tokens2[1]}">
                          <xsl:value-of select="$tokens2[position()>1]" separator=" "/>
                        </xsl:element>                      
                      </xsl:when>
                      <xsl:otherwise>
                        <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
                      </xsl:otherwise>
                    </xsl:choose>
                  </xsl:for-each>
                </xsl:element>            
              </xsl:when>
              <xsl:otherwise>
                <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:for-each>
        </Jamesfilms>
      </xsl:variable>
    
      <xsl:template match="@*|node()">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="/">
        <xsl:apply-templates select="$initData"/>    
      </xsl:template>
    
      <!--Add additional templates to do further transforming of the initial data ($initData).-->
    
    </xsl:stylesheet>
    

    EDIT

    You're passing the text file in as the input of the transform. That's why you had to add the <t> element.

    Since you don't actually have an XML input, you can pass the stylesheet itself in as input. Nothing will get processed because we're only applying-templates to the variable in the template that matches root (/).

    You also need to set the input-uri parameter with transformer.setParameter("input-uri", TXT_PATH);. If your path is absolute, be sure to add the file:/// protocol.

    Example...

    Text File

    Heros
    Firstname Sean
    Lastname Connery
    DOB 25-08-1930
    
    Films
    Dr.No 1962
    Goldfinger 1964
    Thunerball 1965
    
    Award
    name Academy
    time 1
    
    Award
    name BAFTA
    time 2
    
    Award
    name Gloden Globes
    time 3
    

    Java (you'll need to change paths/filenames)

    final String TXT_PATH = "file:///C:/tmp/input.txt";
    final String XSLT_PATH = "C:/tmp/txt2xml.xsl";
    final String XML_PATH = "C:/tmp/test_xml_result.xml";
    
    TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
    Transformer transformer = tFactory.newTransformer(new StreamSource(new File(XSLT_PATH)));
    transformer.setParameter("input-uri", TXT_PATH);
    transformer.transform(new StreamSource(new File(XSLT_PATH)),new StreamResult(new File(XML_PATH)));
    

    XSLT 2.0

    Same as above.

    Output

    <Jamesfilms>
       <Heros>
          <Firstname>Sean</Firstname>
          <Lastname>Connery</Lastname>
          <DOB>25-08-1930</DOB>
       </Heros>
       <Films>
          <Dr.No>1962</Dr.No>
          <Goldfinger>1964</Goldfinger>
          <Thunerball>1965</Thunerball>
       </Films>
       <Award>
          <name>Academy</name>
          <time>1</time>
       </Award>
       <Award>
          <name>BAFTA</name>
          <time>2</time>
       </Award>
       <Award>
          <name>Gloden Globes</name>
          <time>3</time>
       </Award>
    </Jamesfilms>
    

    However, since you're using Saxon you could use the s9api and specify an initial template. This is the way I would do it instead of passing the stylesheet as the input to the transform.

    Example...

    Java

    final String TXT_PATH = "file:///C:/tmp/input.txt";
    final String XSLT_PATH = "C:/tmp/txt2xml.xsl";
    final String XML_PATH = "C:/tmp/test_xml_result.xml";
    
    Processor processor = new Processor(false);
    Serializer serializer = processor.newSerializer();
    serializer.setOutputFile(new File(XML_PATH));
    XsltCompiler compiler = processor.newXsltCompiler();
    XsltExecutable executable = compiler.compile(new StreamSource(new File(XSLT_PATH)));
    XsltTransformer transformer = executable.load();
    transformer.setInitialTemplate(new QName("root"));
    transformer.setParameter(new QName("input-uri"), new XdmAtomicValue(TXT_PATH));
    transformer.setDestination(serializer);
    transformer.transform();
    

    XSLT 2.0

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
      <xsl:output indent="yes"/>
      <xsl:strip-space elements="*"/>
    
      <xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>
      <xsl:param name="input-uri" as="xs:string"/>
    
      <xsl:variable name="initData" as="node()">
        <Jamesfilms>
          <xsl:for-each select="tokenize(unparsed-text($input-uri, $input-encoding),'\r?\n\r?\n')">
            <xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
            <xsl:choose>
              <xsl:when test="$tokens[1] castable as xs:QName">
                <xsl:element name="{replace($tokens[1],'\s','')}">
                  <xsl:for-each select="$tokens[position() > 1]">
                    <xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
                    <xsl:choose>
                      <xsl:when test="$tokens2[1] castable as xs:QName">
                        <xsl:element name="{$tokens2[1]}">
                          <xsl:value-of select="$tokens2[position()>1]" separator=" "/>
                        </xsl:element>                      
                      </xsl:when>
                      <xsl:otherwise>
                        <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
                      </xsl:otherwise>
                    </xsl:choose>
                  </xsl:for-each>
                </xsl:element>            
              </xsl:when>
              <xsl:otherwise>
                <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:for-each>
        </Jamesfilms>
      </xsl:variable>
    
      <xsl:template match="@*|node()">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="/" name="root">
        <xsl:apply-templates select="$initData"/>    
      </xsl:template>
    
      <!--Add additional templates to do further transforming of the initial data ($initData).-->
    
    </xsl:stylesheet>
    

    Input and output would be the same. Let me know if you need me to add the java imports to the example.

    0 讨论(0)
提交回复
热议问题