问题
I have a following sample sgml data from my .sgm file and I want convert this in to xml
<?dtd name="viewed">
<?XMLDOC>
<viewed >xyz
<cite>
<yr>2010
<pno cite="2010 abc 1188">10
<?/XMLDOC>
<?XMLDOC>
<viewed>abc.
<cite>
<yr>2010
<pno cite="2010 xyz 5133">9
<?/XMLDOC>
Output should be like this:
<index1>
<num viewed="xyz"/>
<heading>xyz</heading>
<index-refs>
<link caseno="2010 abc 1188</link>
</index-refs>
</index-1>
<index1>
<num viewed="abc"/>
<heading>abc</heading>
<index-refs>
<link caseno="2010 xyz 5133</link>
</index-refs>
</index-1>
Can this be done in c# or can we use xslt 2.0 to do this kind of conversion?
回答1:
Others have already given some good advice. Here's one way of putting it all together by first converting the input SGML to well-formed XML and then using XSLT to transform that to the exact format you need.
Converting your SGML to well-formed XML
The osx
tool from the OpenSP package suggested by mzjn is a good tool for this. Since your SGML markup omits end tags, you need to have a DTD from which the correct nesting of elements can be determined. If you don't have a DTD, you need to create one. For your example input, it could be as simple as this:
<!ELEMENT toplevel o o (viewed)+>
<!ELEMENT viewed - o (#PCDATA,cite)>
<!ELEMENT cite - o (yr,pno)>
<!ELEMENT yr - o (#PCDATA)>
<!ELEMENT pno - o (#PCDATA)>
<!ATTLIST pno cite CDATA #REQUIRED>
You also need to add a proper doctype declaration to the beginning of your SGML file. Assuming you have your DTD in file viewed.dtd
.
<!DOCTYPE toplevel SYSTEM "viewed.dtd" >
With this addition, you should now be able use osx
to convert the SGML to XML. (It won't be able to convert the processing instructions which start with a /
as those are not allowed in XML, and will emit a warning about them.)
osx input.sgm > input.xml
Transforming the resulting XML to your desired format
For the above case, you could use something like the following XSLT stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="VIEWED">
<index1>
<num viewed="{normalize-space(text())}"/>
<heading>
<xsl:value-of select="normalize-space(text())"/>
</heading>
<index-refs>
<xsl:apply-templates select="CITE"/>
</index-refs>
</index1>
</xsl:template>
<xsl:template match="CITE">
<link caseno="{PNO/@CITE}"/>
</xsl:template>
</xsl:stylesheet>
回答2:
Maybe you can use the osx SGML to XML converter. It is part of the OpenSP package (based on SP, originally written by James Clark).
- http://openjade.sourceforge.net/doc/index.htm
- http://www.jclark.com/sp/index.htm
回答3:
Can the SGML-Reader, originally developed by Chris Lovett help in solving this problem?
回答4:
Why XSLT? I doubt you can map SGML to XML Infoset or XDM...
I think that you should better use the language made for this task: DSSSL (Document Style Semantics and Specification Language)
This is the predecessor of XSLT. The author is James Clark. And this is the his site.
回答5:
Please take a look at some suggestions for SGML -> XML conversion I posted on this question:
Strategy for parsing LOTS and LOTS of not-so-well formed SGML / XML documents
来源:https://stackoverflow.com/questions/4452537/sgml-to-xml-conversion