问题
I am struggling with this question. Could I, using XSLT, generate an XSD based on an XML file input ? I know there are many software that I can use to do that automatically, but I need it by writing code.
Could you help me with how to start in this and support me with useful resources?
This is a sample XML file: I need to generate XSD using XSLT and validate it:
<test>
<a>
<b> </b>
</a>
<d> </d>
</test>
回答1:
As Marcus observed in the comments, there are many degrees of freedom, but here's a start for you:
Your input XML:
<test>
<a>
<b> </b>
</a>
<d> </d>
</test>
Given to this XSLT:
<xsl:stylesheet version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:apply-templates/>
</xs:schema>
</xsl:template>
<xsl:template match="*[*]">
<xs:element name="{local-name()}">
<xs:complexType>
<xs:sequence>
<xsl:apply-templates select="*"/>
</xs:sequence>
<xsl:apply-templates select="@*"/>
</xs:complexType>
</xs:element>
</xsl:template>
<xsl:template match="*[not(*) and text()]">
<xs:element name="{local-name()}">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xsl:apply-templates select="@*"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xsl:template>
<xsl:template match="@*">
<xs:attribute name="{local-name()}"/>
</xsl:template>
</xsl:stylesheet>
Yields this XSD:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="test">
<xs:complexType>
<xs:sequence>
<xs:element name="a">
<xs:complexType>
<xs:sequence>
<xs:element name="b">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string"/>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="d">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string"/>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Bonus
Your input XML with attributes added:
<test w="w1">
<a x="x1" y="y1">
<b z="z1"> </b>
</a>
<d> </d>
</test>
Yields this XSD:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="test">
<xs:complexType>
<xs:sequence>
<xs:element name="a">
<xs:complexType>
<xs:sequence>
<xs:element name="b">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="z"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="x"/>
<xs:attribute name="y"/>
</xs:complexType>
</xs:element>
<xs:element name="d">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string"/>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="w"/>
</xs:complexType>
</xs:element>
</xs:schema>
Left as exercises:
- Generate XSD styles other than Russian Doll.
- Handle namespaces.
- Tighten typing where possible beyond string primitives.
- ...
回答2:
There are many schemas that describe any given input document, so finding an algorithm that generates what you consider to be a "good" schema is tricky; it's largely a trial and error process, and can be quite fun as I know because I once did it myself.
Doing it in XSLT is probably no more difficult than doing it in any other language, though my own DTDGenerator was written in Java because it had to work in streaming mode and at the time (15 years ago, probably!) that wasn't possible in XSLT.
The significant algorithmic challenge is: given N instances of the sequence of children of an element E, find a grammar that they all conform to, without making the grammar so trivial as to be meaningless. For example if you have the three instances
<E><a/><b/><c/><d/></E>
<E><a/><a/><b/><c/></E>
<E><a/><a/><a/><b/><c/></d></d></E>
you would ideally like to come up with the content model a+ b c d*. In my DTD generator the algorithm I used was: first eliminate repetitions, so these reduce to abcd, abc, and abcd. Then if one of these is a substring of another, drop it, so you only have "abcd" left. That forms the base sequence, and you can then add in the min and max cardinalities. It's more tricky if you have an instance abcd and another abdc; in such cases I decided that it probably indicated order was unimportant, so I generated (a|b|c|d)*. But you could try to do better!
You also want to avoid overconstraining: just because all instances of element are numbers in the range 83 to 78392 doesn't mean that your schema should impose these as minumum and maximum values. On the other hand, if they are all greater than zero, then that probbaly tells you something.
来源:https://stackoverflow.com/questions/26567618/how-to-generate-xsd-using-xslt-manually