What's the best way to validate an XML file against an XSD file?

前端 未结 13 1318
滥情空心
滥情空心 2020-11-22 07:37

I\'m generating some xml files that needs to conform to an xsd file that was given to me. What\'s the best way to verify they conform?

相关标签:
13条回答
  • 2020-11-22 08:21

    Since this is a popular question, I will point out that java can also validate against "referred to" xsd's, for instance if the .xml file itself specifies XSD's in the header, using xsi:schemaLocation or xsi:noNamespaceSchemaLocation (or xsi for particular namespaces) ex:

    <document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="http://www.example.com/document.xsd">
      ...
    

    or schemaLocation (always a list of namespace to xsd mappings)

    <document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.example.com/my_namespace http://www.example.com/document.xsd">
      ...
    

    The other answers work here as well, because the .xsd files "map" to the namespaces declared in the .xml file, because they declare a namespace, and if matches up with the namespace in the .xml file, you're good. But sometimes it's convenient to be able to have a custom resolver...

    From the javadocs: "If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:"

    SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
    Schema schema = factory.newSchema();
    

    and this works for multiple namespaces, etc. The problem with this approach is that the xmlsns:xsi is probably a network location, so it'll by default go out and hit the network with each and every validation, not always optimal.

    Here's an example that validates an XML file against any XSD's it references (even if it has to pull them from the network):

      public static void verifyValidatesInternalXsd(String filename) throws Exception {
        InputStream xmlStream = new new FileInputStream(filename);
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(true);
        factory.setNamespaceAware(true);
        factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
                     "http://www.w3.org/2001/XMLSchema");
        DocumentBuilder builder = factory.newDocumentBuilder();
        builder.setErrorHandler(new RaiseOnErrorHandler());
        builder.parse(new InputSource(xmlStream));
        xmlStream.close();
      }
    
      public static class RaiseOnErrorHandler implements ErrorHandler {
        public void warning(SAXParseException e) throws SAXException {
          throw new RuntimeException(e);
        }
        public void error(SAXParseException e) throws SAXException {
          throw new RuntimeException(e);
        }
        public void fatalError(SAXParseException e) throws SAXException {
          throw new RuntimeException(e);
        }
      }
    

    You can avoid pulling referenced XSD's from the network, even though the xml files reference url's, by specifying the xsd manually (see some other answers here) or by using an "XML catalog" style resolver. Spring apparently also can intercept the URL requests to serve local files for validations. Or you can set your own via setResourceResolver, ex:

    Source xmlFile = new StreamSource(xmlFileLocation);
    SchemaFactory schemaFactory = SchemaFactory
                                    .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
    Schema schema = schemaFactory.newSchema();
    Validator validator = schema.newValidator();
    validator.setResourceResolver(new LSResourceResolver() {
      @Override
      public LSInput resolveResource(String type, String namespaceURI,
                                     String publicId, String systemId, String baseURI) {
        InputSource is = new InputSource(
                               getClass().getResourceAsStream(
                              "some_local_file_in_the_jar.xsd"));
                              // or lookup by URI, etc...
        return new Input(is); // for class Input see 
                              // https://stackoverflow.com/a/2342859/32453
      }
    });
    validator.validate(xmlFile);
    

    See also here for another tutorial.

    I believe the default is to use DOM parsing, you can do something similar with SAX parser that is validating as well saxReader.setEntityResolver(your_resolver_here);

    0 讨论(0)
  • 2020-11-22 08:27

    If you are generating XML files programatically, you may want to look at the XMLBeans library. Using a command line tool, XMLBeans will automatically generate and package up a set of Java objects based on an XSD. You can then use these objects to build an XML document based on this schema.

    It has built-in support for schema validation, and can convert Java objects to an XML document and vice-versa.

    Castor and JAXB are other Java libraries that serve a similar purpose to XMLBeans.

    0 讨论(0)
  • 2020-11-22 08:33

    Using Woodstox, configure the StAX parser to validate against your schema and parse the XML.

    If exceptions are caught the XML is not valid, otherwise it is valid:

    // create the XSD schema from your schema file
    XMLValidationSchemaFactory schemaFactory = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
    XMLValidationSchema validationSchema = schemaFactory.createSchema(schemaInputStream);
    
    // create the XML reader for your XML file
    WstxInputFactory inputFactory = new WstxInputFactory();
    XMLStreamReader2 xmlReader = (XMLStreamReader2) inputFactory.createXMLStreamReader(xmlInputStream);
    
    try {
        // configure the reader to validate against the schema
        xmlReader.validateAgainst(validationSchema);
    
        // parse the XML
        while (xmlReader.hasNext()) {
            xmlReader.next();
        }
    
        // no exceptions, the XML is valid
    
    } catch (XMLStreamException e) {
    
        // exceptions, the XML is not valid
    
    } finally {
        xmlReader.close();
    }
    

    Note: If you need to validate multiple files, you should try to reuse your XMLInputFactory and XMLValidationSchema in order to maximize the performance.

    0 讨论(0)
  • 2020-11-22 08:37

    Are you looking for a tool or a library?

    As far as libraries goes, pretty much the de-facto standard is Xerces2 which has both C++ and Java versions.

    Be fore warned though, it is a heavy weight solution. But then again, validating XML against XSD files is a rather heavy weight problem.

    As for a tool to do this for you, XMLFox seems to be a decent freeware solution, but not having used it personally I can't say for sure.

    0 讨论(0)
  • 2020-11-22 08:38

    Here's how to do it using Xerces2. A tutorial for this, here (req. signup).

    Original attribution: blatantly copied from here:

    import org.apache.xerces.parsers.DOMParser;
    import java.io.File;
    import org.w3c.dom.Document;
    
    public class SchemaTest {
      public static void main (String args[]) {
          File docFile = new File("memory.xml");
          try {
            DOMParser parser = new DOMParser();
            parser.setFeature("http://xml.org/sax/features/validation", true);
            parser.setProperty(
                 "http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", 
                 "memory.xsd");
            ErrorChecker errors = new ErrorChecker();
            parser.setErrorHandler(errors);
            parser.parse("memory.xml");
         } catch (Exception e) {
            System.out.print("Problem parsing the file.");
         }
      }
    }
    
    0 讨论(0)
  • 2020-11-22 08:39

    With JAXB, you could use the code below:

        @Test
    public void testCheckXmlIsValidAgainstSchema() {
        logger.info("Validating an XML file against the latest schema...");
    
        MyValidationEventCollector vec = new MyValidationEventCollector();
    
        validateXmlAgainstSchema(vec, inputXmlFileName, inputXmlSchemaName, inputXmlRootClass);
    
        assertThat(vec.getValidationErrors().isEmpty(), is(expectedValidationResult));
    }
    
    private void validateXmlAgainstSchema(final MyValidationEventCollector vec, final String xmlFileName, final String xsdSchemaName, final Class<?> rootClass) {
        try (InputStream xmlFileIs = Thread.currentThread().getContextClassLoader().getResourceAsStream(xmlFileName);) {
            final JAXBContext jContext = JAXBContext.newInstance(rootClass);
            // Unmarshal the data from InputStream
            final Unmarshaller unmarshaller = jContext.createUnmarshaller();
    
            final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
            final InputStream schemaAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(xsdSchemaName);
            unmarshaller.setSchema(sf.newSchema(new StreamSource(schemaAsStream)));
    
            unmarshaller.setEventHandler(vec);
    
            unmarshaller.unmarshal(new StreamSource(xmlFileIs), rootClass).getValue(); // The Document class is the root object in the XML file you want to validate
    
            for (String validationError : vec.getValidationErrors()) {
                logger.trace(validationError);
            }
        } catch (final Exception e) {
            logger.error("The validation of the XML file " + xmlFileName + " failed: ", e);
        }
    }
    
    class MyValidationEventCollector implements ValidationEventHandler {
        private final List<String> validationErrors;
    
        public MyValidationEventCollector() {
            validationErrors = new ArrayList<>();
        }
    
        public List<String> getValidationErrors() {
            return Collections.unmodifiableList(validationErrors);
        }
    
        @Override
        public boolean handleEvent(final ValidationEvent event) {
            String pattern = "line {0}, column {1}, error message {2}";
            String errorMessage = MessageFormat.format(pattern, event.getLocator().getLineNumber(), event.getLocator().getColumnNumber(),
                    event.getMessage());
            if (event.getSeverity() == ValidationEvent.FATAL_ERROR) {
                validationErrors.add(errorMessage);
            }
            return true; // you collect the validation errors in a List and handle them later
        }
    }
    
    0 讨论(0)
提交回复
热议问题