Should .Net XML Schema Validation stop upon reaching first invalid element?

六月ゝ 毕业季﹏ 提交于 2019-12-11 01:12:23

问题


I have an XML string and a Schema loaded up and passed into a function. I have it validating the XML against the schema correctly, however it always stops validating at the scope of the first invalid element. Invalid data, it keeps going, invalid/missing attributes, keeps going, but invalid elements, it stops and will not validate further within that scope.

The schema is as follows:

<?xml version="1.0" encoding="utf-16"?>
<xs:schema xmlns:b="http://schemas.microsoft.com/BizTalk/2003" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="items">
          <xs:complexType>
            <xs:sequence maxOccurs="unbounded">
              <xs:element name="foo">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="bar" type="xs:integer" />
                    <xs:element name="bat" type="xs:integer" />
                  </xs:sequence>
                  <xs:attribute name="attr1" type="xs:integer" use="required" />
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Xml is as follows:

<root>
  <items>
    <foo attr1='1'>
      <invalid0>1</invalid0>
      <invalid1>b</invalid1>
    </foo>
    <foo attr1='1'>
      <invalid2>b</invalid2>
      <bat>b</bat>
    </foo>
    <foo attr1='1'>
      <bar>3</bar>
    </foo>
    <invalidFoo attr1='1'>
      <bar>d</bar>
      <bat>2</bat>
    </invalidFoo>
    <foo>
      <bar>3</bar>
      <bat>q</bat>
    </foo>
  </items>
</root>

So, what happens in this example the validator reaches the first <foo> and sees <invalid0> and doesn't validate within the <foo> any further and therefore misses the <invalid1>. Validator moves on to the next <foo>.

The next <foo> it sees there is an <invalid2> which doesn't belong there and doesn't bother to catch the invalid data for the <bat> element (string instead of integer). It goes directly to the next <foo>

It makes it to the next <foo> element and throws an error about a missing <bat> and moves on to the next <foo>, cool.

Now it gets to the the <invalidFoo> and rightfully so, does't do any validation inside of the <invalidFoo> because, of course, what's an <invalidFoo>?

The sticking point for me is, at this point the validator stops validating all the following <foo> sibling elements, so the invalid data in the last <bat> is not caught. So now, the reason I'm asking is because the way I'm using validation is to try to catch all errors (or at least as many as possible) and pass them back to the user. The first test I did in my actual code was the equivalent of this:

<root>
  <items>
    <invalidFoo attr1='1'>
      <invalid0>1</invalid0>
      <invalid1>b</invalid1>
    </invalidFoo>
    <foo attr1='1'>
      <invalid2>b</invalid2>
      <bat>b</bat>
    </foo>
    <foo attr1='1'>
      <bar>3</bar>
    </foo>
    <invalidFoo attr1='1'>
      <bar>d</bar>
      <bat>2</bat>
    </invalidFoo>
    <foo>
      <bar>3</bar>
      <bat>q</bat>
    </foo>
  </items>
</root>

So, the validator saw that first <invalidFoo> and stopped dead. For the longest time, I was assuming that for some reason the validation was always stopping on the first error. It wasn't until I added a valid <foo> back that it started catching and accumulating the other invalid data errors in succession. But as soon as it hits an invalid element tagname, all sibling/child-level validation is skipped. It only happens upon invalid elements, not attributes or data.

Now, I'm not saying this is right or wrong...I'm asking if this is right or wrong? Should the validator keep going, especially in the case of sibling elements? Or should it be stopping and basically calling an entire list of elements invalid based on a previous one being invalid? What is the expected behaviour of the Xml Schema Validator in this situation?

This is all being done using the following C# code (which works as I expect it to):

 public static void ValidateAgainstSchema(string XMLSourceDocument, XmlSchemaSet validatingSchemas)
 {
    if (validatingSchemas == null)
    {
        throw new ArgumentNullException("In ValidateAgainstSchema: No schema loaded.");
    }

    string errorHolder = string.Empty;
    ValidationHandler handler = new ValidationHandler();

    XmlReaderSettings settings = new XmlReaderSettings();
    settings.CloseInput = true;
    settings.ValidationType = ValidationType.Schema;
    settings.ValidationEventHandler += new ValidationEventHandler(handler.HandleValidationError);
    settings.Schemas.Add(validatingSchemas);
    settings.ValidationFlags =
        XmlSchemaValidationFlags.ReportValidationWarnings |
        XmlSchemaValidationFlags.ProcessIdentityConstraints |
        XmlSchemaValidationFlags.ProcessInlineSchema |
        XmlSchemaValidationFlags.ProcessSchemaLocation;

    StringReader srStringReader = new StringReader(XMLSourceDocument);

    using (XmlReader validatingReader = XmlReader.Create(srStringReader, settings))
    {
        while (validatingReader.Read()) { }
    }

    if (handler.MyValidationErrors.Count > 0)
    {
        foreach (String messageItem in handler.MyValidationErrors)
        {
            errorHolder += messageItem;
        }
        throw new XmlSchemaValidationException(errorHolder);
    }
}

The validation event handler just catches the errors and adds them to an IList<string> for displaying later all together.


回答1:


It does elements by walking a tree, so as soon as it gets a node that doesn't fit, it's lost. Attributes however are not hierarchical, they are a list, so it's a straight go / no go, and it can continue, type checking is simple as well.

You can look at your example and think well it could deal with that what about this though.

<root>
  <items>
    <invalidFoo attr1='1'>
      <invalid0>1</invalid0>
      <invalid1>b</invalid1>
      <foo attr1='1'>
        <bar>b</bar>
        <bat>b</bat>
      </foo>    
    </invalidFoo>
  <items>
<root>

Should foo be treated as a child of items or not. Is foo really a foo?

If you want a real head bender imagine having an xsd:choice or two in there and having a selection of valid nodes that don't meet the schema. It's one of those situations, where it's "dangerous" to try and continue, so it tips out and says you need to fix this first so I can sensibly validate what comes after.



来源:https://stackoverflow.com/questions/13314799/should-net-xml-schema-validation-stop-upon-reaching-first-invalid-element

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!