I have a following sample sgml data from my .sgm file and I want convert this in to xml
xyz
<
Maybe you can use the osx SGML to XML converter. It is part of the OpenSP package (based on SP, originally written by James Clark).
Others have already given some good advice. Here's one way of putting it all together by first converting the input SGML to well-formed XML and then using XSLT to transform that to the exact format you need.
Converting your SGML to well-formed XML
The osx
tool from the OpenSP package suggested by mzjn is a good tool for this. Since your SGML markup omits end tags, you need to have a DTD from which the correct nesting of elements can be determined. If you don't have a DTD, you need to create one. For your example input, it could be as simple as this:
<!ELEMENT toplevel o o (viewed)+>
<!ELEMENT viewed - o (#PCDATA,cite)>
<!ELEMENT cite - o (yr,pno)>
<!ELEMENT yr - o (#PCDATA)>
<!ELEMENT pno - o (#PCDATA)>
<!ATTLIST pno cite CDATA #REQUIRED>
You also need to add a proper doctype declaration to the beginning of your SGML file. Assuming you have your DTD in file viewed.dtd
.
<!DOCTYPE toplevel SYSTEM "viewed.dtd" >
With this addition, you should now be able use osx
to convert the SGML to XML. (It won't be able to convert the processing instructions which start with a /
as those are not allowed in XML, and will emit a warning about them.)
osx input.sgm > input.xml
Transforming the resulting XML to your desired format
For the above case, you could use something like the following XSLT stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="VIEWED">
<index1>
<num viewed="{normalize-space(text())}"/>
<heading>
<xsl:value-of select="normalize-space(text())"/>
</heading>
<index-refs>
<xsl:apply-templates select="CITE"/>
</index-refs>
</index1>
</xsl:template>
<xsl:template match="CITE">
<link caseno="{PNO/@CITE}"/>
</xsl:template>
</xsl:stylesheet>
Why XSLT? I doubt you can map SGML to XML Infoset or XDM...
I think that you should better use the language made for this task: DSSSL (Document Style Semantics and Specification Language)
This is the predecessor of XSLT. The author is James Clark. And this is the his site.
Can the SGML-Reader, originally developed by Chris Lovett help in solving this problem?