I have an text file with following data:
Heros
Firstname Sean
Lastname Connery
DOB 25-08-1930
Films
Dr.No 1962
Goldfinger 1964
Thunerball 1965
Awa
You shouldn't need to group; you could just tokenize (and tokenize and tokenize...).
Here's an example. It doesn't do anything with the case of the element names. You can either handle those changes during the building of $initData
, or you can add additional templates to handle any changes.
Also, the element names have to be valid QNames. Right now the stylesheet terminates processing with a message, but you can change how that's handled.
This should at least get you started...
XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>
<xsl:param name="input-uri" as="xs:string" select="'so.txt'"/>
<xsl:variable name="initData" as="node()">
<Jamesfilms>
<xsl:for-each select="tokenize(unparsed-text($input-uri, $input-encoding),'\r?\n\r?\n')">
<xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
<xsl:choose>
<xsl:when test="$tokens[1] castable as xs:QName">
<xsl:element name="{$tokens[1]}">
<xsl:for-each select="$tokens[position() > 1]">
<xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
<xsl:choose>
<xsl:when test="$tokens2[1] castable as xs:QName">
<xsl:element name="{$tokens2[1]}">
<xsl:value-of select="$tokens2[position()>1]" separator=" "/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</Jamesfilms>
</xsl:variable>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="$initData"/>
</xsl:template>
<!--Add additional templates to do further transforming of the initial data ($initData).-->
</xsl:stylesheet>
EDIT
You're passing the text file in as the input of the transform. That's why you had to add the <t>
element.
Since you don't actually have an XML input, you can pass the stylesheet itself in as input. Nothing will get processed because we're only applying-templates to the variable in the template that matches root (/
).
You also need to set the input-uri
parameter with transformer.setParameter("input-uri", TXT_PATH);
. If your path is absolute, be sure to add the file:///
protocol.
Example...
Text File
Heros
Firstname Sean
Lastname Connery
DOB 25-08-1930
Films
Dr.No 1962
Goldfinger 1964
Thunerball 1965
Award
name Academy
time 1
Award
name BAFTA
time 2
Award
name Gloden Globes
time 3
Java (you'll need to change paths/filenames)
final String TXT_PATH = "file:///C:/tmp/input.txt";
final String XSLT_PATH = "C:/tmp/txt2xml.xsl";
final String XML_PATH = "C:/tmp/test_xml_result.xml";
TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
Transformer transformer = tFactory.newTransformer(new StreamSource(new File(XSLT_PATH)));
transformer.setParameter("input-uri", TXT_PATH);
transformer.transform(new StreamSource(new File(XSLT_PATH)),new StreamResult(new File(XML_PATH)));
XSLT 2.0
Same as above.
Output
<Jamesfilms>
<Heros>
<Firstname>Sean</Firstname>
<Lastname>Connery</Lastname>
<DOB>25-08-1930</DOB>
</Heros>
<Films>
<Dr.No>1962</Dr.No>
<Goldfinger>1964</Goldfinger>
<Thunerball>1965</Thunerball>
</Films>
<Award>
<name>Academy</name>
<time>1</time>
</Award>
<Award>
<name>BAFTA</name>
<time>2</time>
</Award>
<Award>
<name>Gloden Globes</name>
<time>3</time>
</Award>
</Jamesfilms>
However, since you're using Saxon you could use the s9api and specify an initial template. This is the way I would do it instead of passing the stylesheet as the input to the transform.
Example...
Java
final String TXT_PATH = "file:///C:/tmp/input.txt";
final String XSLT_PATH = "C:/tmp/txt2xml.xsl";
final String XML_PATH = "C:/tmp/test_xml_result.xml";
Processor processor = new Processor(false);
Serializer serializer = processor.newSerializer();
serializer.setOutputFile(new File(XML_PATH));
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable executable = compiler.compile(new StreamSource(new File(XSLT_PATH)));
XsltTransformer transformer = executable.load();
transformer.setInitialTemplate(new QName("root"));
transformer.setParameter(new QName("input-uri"), new XdmAtomicValue(TXT_PATH));
transformer.setDestination(serializer);
transformer.transform();
XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>
<xsl:param name="input-uri" as="xs:string"/>
<xsl:variable name="initData" as="node()">
<Jamesfilms>
<xsl:for-each select="tokenize(unparsed-text($input-uri, $input-encoding),'\r?\n\r?\n')">
<xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
<xsl:choose>
<xsl:when test="$tokens[1] castable as xs:QName">
<xsl:element name="{replace($tokens[1],'\s','')}">
<xsl:for-each select="$tokens[position() > 1]">
<xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
<xsl:choose>
<xsl:when test="$tokens2[1] castable as xs:QName">
<xsl:element name="{$tokens2[1]}">
<xsl:value-of select="$tokens2[position()>1]" separator=" "/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</Jamesfilms>
</xsl:variable>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/" name="root">
<xsl:apply-templates select="$initData"/>
</xsl:template>
<!--Add additional templates to do further transforming of the initial data ($initData).-->
</xsl:stylesheet>
Input and output would be the same. Let me know if you need me to add the java imports to the example.