Is there any difference between 'valid xml' and 'well formed xml'?

邮差的信 提交于 2019-11-25 21:41:43

问题


I wasn\'t aware of a difference, but a coworker says there is, although he can\'t back it up. What\'s the difference if any?


回答1:


There is a difference, yes.

XML that adheres to the XML standard is considered well formed, while xml that adheres to a DTD is considered valid.




回答2:


Valid XML is XML that succeeds validation against a DTD.

Well formed XML is XML that has all tags closed in the proper order and, if it has a declaration, it has it first thing in the file with the proper attributes.

In other words, validity refers to semantics, well-formedness refers to syntax.

So you can have invalid well formed XML.




回答3:


Well-formed vs Valid XML

Well-formed means that a textual object meets the W3C requirements for being XML.

Valid means that well-formed XML meets additional requirements given by a specified schema.


Official Definitions

Per the W3C Recommendation for XML:

[Definition: A data object is an XML document if it is well-formed, as defined in this specification. In addition, the XML document is valid if it meets certain further constraints.]


Observations:

  • A document that is not well-formed is not XML. (Well-formed XML is commonly used but technically redundant.)
  • Being valid implies being well-formed.
  • Being well-formed does not imply being valid.
  • Although the W3C Recommendation for XML defines validity to be against a DTD, conventional use allows the term to be applied for conformance to XML schemas specified via XSD, RELAX NG, Schematron, or other methods.

Examples of what causes a document to be...

Not well-formed:

  • An element lacks a closing tag (and is not self-closing).
  • Elements overlap without proper nesting: <a><b></a></b>
  • An attribute value is missing a closing quote that matches the opening quote.
  • < or & are used in content rather than &lt or &amp;.
  • Multiple root elements exist.
  • Multiple XML declarations exist, or an XML declaration appears other than at the top of the document.

Invalid:

  • An element or attribute is missing but required by the XML schema.
  • An element or attribute is used but undefined by the XML schema.
  • The content of an element does not match the content specified by the XML schema.
  • The value of an attribute does not match the type specified by the XML schema.

Namespace-Well-Formed

Technically, colon characters are permitted in component names in XML. However, colons should only be used in names for namespace purposes:

Note:

The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.

Therefore, another term, namespace-well-formed, is defined in the Namespaces in XML 1.0 W3C Recommendation that implies all of the XML rules for well-formedness plus those governing namespaces and namespace prefixes.

Colloquially, the term well-formed is often used where namespace-well-formed would be more precise. However, this is a minor technical manner of less practical consequence than the distinction between well-formed vs valid XML described in this answer.




回答4:


As others have said, well-formed XML conforms to the XML spec, and valid XML conforms to a given schema.

Another way to put it is that well-formed XML is lexically correct (it can be parsed), while valid XML is grammatically correct (it can be matched to a known vocabulary and grammar).

An XML document cannot be valid until it is well-formed. All XML documents are held to the same standard for well-formedness (an RFC put out by the W3). One XML document can be valid against some schemas, and invalid against others. There are a number of schema languages, many of which are themselves XML-based.




回答5:


Well-Formed XML is XML that meets the syntactic requirements of the language. Not missing any closing tags, having all your singleton tags use <whatever /> instead of just <whatever>, and having your closing tags in the right order.

Valid XML is XML that uses a DTD and complies with all its requirements. So if you use an attribute improperly, you violate the DTD and aren't valid.

All valid XML is well-formed, but not all well-formed XML is valid.




回答6:


XML is well-formed if meets the requirements for all XML documents set out by the standards - so things like having a single root node, having nodes correctly nested, all nodes having a closing tag (or using the empty node shorthand of a slash before the closing angle bracket), attributes being quoted etc. Being well-formed just means it adheres to the rules of XML and can therefore be parsed properly.

XML is valid if it will validate against a DTD or schema. This obviously differs from case to case - XML that is valid against one schema won't be valid against another schema, even though it is still well-formed.

If XML isn't well-formed it can't be properly parsed - parsers will simply throw an exception or report an error. This is generic and it doesn't matter what your XML contains. Only once it is parsed can it be checked for validity. This domain or context dependent and requires a DTD or schema to validate against. For simple XML documents, you may not have a DTD or schema, in which case you can't know if the XML is valid - the concept or validity simply doesn't apply in this case. Of course, this doesn't mean you can't use it, it just means you can't tell whether or not it's valid.




回答7:


W3C, in the XML specification, has defined certain rules that needs to be followed while creating XML documents. The examples of such rules include having exactly one root element, having end-tag for each start-tag, using single/double quotes for attribute values, and so on. If an XML document follows all these rules, it is said to be well-formed document and XML parsers can be used to parse and process such documents.

Document Type Definitions (DTDs) or XML Schemas can be used to define the structure and content of a specific class of XML documents. This includes the parent-child relationship details, attribute lists, data type information, value restrictions, etc. In addition to the well-formedness rules, if an XML document also follows the rules specified in the associated DTD/Schema, it is said to be a valid XML document.

All valid XML documents are well-formed, but the reverse is not always true. Well-formed XML documents do not necessarily have to be valid.




回答8:


I'll add that valid XML also implies that it's well-formed, but well-formed XML is not necessarily valid.




回答9:


In addition to the aforementioned DTD's, there are 2 other ways of describing and validating XML documents are XMLSchema and RelaxNG, both of which may be easier to use and support more features than DTD.




回答10:


If XML is confirming to DTD rules then it's a valid XML. If a XML document is conforming to XML rules (all tags started are closed,there is a root element etc)then it's a well formed XML.




回答11:


Taken from Extensible Markup Language (XML) 1.0 (Fifth Edition) - W3C Recommendation 26 November 2008 :

[Definition: A data object is an XML document if it is well-formed, as defined in this specification. In addition, the XML document is valid if it meets certain further constraints.]


For those who prefer psuedo-code to paragraphs upon paragraphs of text... :)

IF is_well_formed(<XML_doc>) THEN
    # It is well-formed, and can be parsed
    IF is_valid(<XML_doc>) THEN
        # Well-formed and ALSO valid. Hurray! 
        # **A valid XML doc, is a well-formed doc!**
    ELSE
        # Only well-formed, NOT valid
    END IF
ELSE
    # Not well-formed, or valid!
END IF

FUNCTION is_well_formed
    IF <does_not_contain_syntax,_spelling,_punctuation,_grammar_errors,_etc._errors> THEN
        RETURN TRUE
    ELSE 
        RETURN FALSE
    END IF
END FUNCTION 

FUNCTION is_valid
    IF <markup_of_the_XML_document_matches_"some"_defined_standard> THEN
        # Standards used to validate XML could be a DTDs or XML Schemas, referenced within the XML document
        RETURN TRUE
    ELSE 
        RETURN FALSE
    END IF
END FUNCTION

Based on the theory: "Well Formed" vs. Valid




回答12:


DTD is the acronym for Document Type Definition. This is a description of the content for a family of XML files. This is part of the XML 1.0 specification, and allows one to describe and verify that a given document instance conforms to the set of rules detailing its structure and content.

Validation is the process of checking a document against a DTD (more generally against a set of construction rules).

The validation process and building DTDs are the two most difficult parts of the XML life cycle. Briefly a DTD defines all the possible elements to be found within your document, what is the formal shape of your document tree (by defining the allowed content of an element; either text, a regular expression for the allowed list of children, or mixed content i.e. both text and children). The DTD also defines the valid attributes for all elements and the types of those attributes.




回答13:


Well, XML that isn't well formed, sort of by definition, isn't XML. Poeple usually refer to valid XML as XML that adheres to a certain schema (XSD or DTD).




回答14:


See XML DTD on W3 Schools:

An XML document with correct syntax is called "Well Formed".

An XML document validated against a DTD is both "Well Formed" and "Valid".



来源:https://stackoverflow.com/questions/134494/is-there-any-difference-between-valid-xml-and-well-formed-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!