问题
I'm parsing an XML in python. I've an XSD schema to validate the XML. Can I get the type of a particular node of my XML as it was defined in XSD?
For example, my XML (small part) is
<deviceDescription>
<wakeupNote>
<lang xml:lang="ru">Русский</lang>
<lang xml:lang="en">English</lang>
</wakeupNote>
</deviceDescription>
My XSD is (once again a small part of it):
<xsd:element name="deviceDescription" type="zwv:deviceDescription" minOccurs="0"/>
<xsd:complexType name="deviceDescription">
<xsd:sequence>
<xsd:element name="wakeupNote" type="zwv:description" minOccurs="0">
<xsd:unique name="langDescrUnique">
<xsd:selector xpath="zwv:lang"/>
<xsd:field xpath="@xml:lang"/>
</xsd:unique>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="description">
<xsd:sequence>
<xsd:element name="lang" maxOccurs="unbounded">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute ref="xml:lang" use="required"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
During the parse I want to know that my tag wakeupNote is defined in XSD as complexType zwv:description. How to do this (in python)?
What do I need this for? Suppose I have a lot of these XMLs and I want to check that all of them have fields with English language filled. It would be easy to check that the <lang xml:lang="en"></lang>
is empty, but it is allowed not to specify this tag at all.
So the idea is to get all tags that may have language descriptions and check that <lang>
tag is present and has a non-empty content for en.
UPD
Since during validation my XML is checked against XSD, the validation engine knows types of all nodes. I had a similar question 7 month ago which is still with no answer. They are related, imho. Validating and filling default values in XML based on XSD in Python
回答1:
If the question is: How do I find the name of the type for a given XML node? The answer is to use xpath in python to look it up. The xpath to run on the xsd will be
//element[@name='wakeupNote']/@type
this should return zwv:description. If it returns two types, you'll have to walk from the root
/root/foo/wakeupNote (type A)
/root/bar/wakeupNote (type B)
This will be tedious walking down from the root. You'll have to look for both anonomous and named types.
If the question is: How do I find all XML nodes of a given type? If the schema will change frequently, you could test the type of every node as you parse it with the above method.
If the schema is well known, fixed, and the nodes you are looking for are findable with XPATH you could test each node.
//@xml:lang='en'
Then use python to check the length of each.
In the stable-schema case, you could write a second XSD that enforces the criteria you are looking for.
回答2:
You're right that the validator must know the type associations of all the elements and attributes it validates, and that the validator is thus in a position to provide access to that information.
For better or worse, however, both the API between caller and validator and the selection of validation-related information available to the caller is completely implementation-defined. Some validators (Xerces J is a notable example) make a very full range of validation information available; others don't.
Without knowing what validator you are using, no one can tell you with certainty whether the type information you're seeking is available. Since you're calling the validator, there must be an API; if type associations are available through the API, presumably the documentation will say so. If the API doesn't provide access to it, it may be because the underlying schema validator doesn't provide access to the information, or it may be because the creator of the API didn't see the point; your job (if you want to pursue this further) will be to find out which of those is the case and then try to persuade the relevant parties that it would be useful to make the information available.
If you have no luck with getting access to the information through the API, you can help yourself with a more sophisticated version of the approach mentioned in another answer by David W. It is a property of XSD schemas that the governing type of any element is strictly a function of the path to that element from the validation root, so it is straightforward in principle (if more than a bit tedious in practice) to identify, for any element in a document instance, what its governing type will be if the document instance is validated against a particular schema. For the case you mention, for example, it is straightforward to tell whether a given wakeupNote
has deviceDescription
or otherElement
as an ancestor, or which is the nearer ancestor if the wakeupNote
has both, and to infer the appropriate governing type definition based on that knowledge.
Helping yourself in this way is likely to require a non-trivial amount of work. It would help if there were general-purpose tools to calculate this information and make it accessible in various forms, but if there are any such, I don't know about them. (I do know people who could build such a tool for a fee.) So if I were you I'd try to get the information through the API first.
来源:https://stackoverflow.com/questions/4799838/is-it-possible-to-get-the-type-of-an-xml-node-as-it-was-defined-in-xsd