Note: this is supposed to be the canonical post for this question. A number of answers exist already, but descriptions of the various differences are scattered all over the
What is the difference between HTML and XHTML?
There are many differences. The main one is that XHTML is HTML in an XML document, and XML has different syntax rules:
xmlns="http://www.w3.org/1999/xhtml"
explicitly in an XHTML documentx
in hexadecimal character references<![CDATA[
.. ]]>
; HTML cannot<
signs everywhere (except in CDATA sections)Then there are a couple of not XML-related differences:
<meta http-equiv="content-type" ...
as an error in XHTML5 files, but not in HTML5 files.name
attribute on <img>
and <form>
. This was an error though, fixed in XHTML 1.1.Note that XHTML documents should be served up with the correct file type, i.e. a .xhtml file extension or an application/xhtml+xml MIME type. You can't really have XHTML in an HTML document, because browsers don't differentiate between the two syntaxes by looking at the content, only by file type.
In other words, if you have an HTML file, its contents are HTML, no matter if it has valid XML in it or not.
One point about the syntax rules worth mentioning is the casing of tag names. Although HTML documents are case-insensitive, the tag names are actually exposed as uppercase by the DOM. That means that under HTML, a JavaScript command like console.log(document.body.tagName);
would output "BODY", whereas the same command under XHTML would output "body".
Isn't XHTML merely a stricter version of HTML?
No; XML has different rules than HTML, but it's not necessarily stricter. If anything, XML has fewer rules!
In HTML, many features are optional. You can choose to put quotes around attribute values or not; in XML you don't have that choice. And in HTML, you have to remember when you have the choice and when you don't: are quotes optional in <a href=http://my-website.com/?login=true>
? In XML, you don't have to think about that. XML is easier.
In HTML, some elements are defined as raw text elements, that is, elements that contain plain text rather than markup.
And some other elements are escapable raw text elements, in which references like é
will be parsed, but things like <b>bold</b>
and <!-- comment -->
will be treated as plain text. If you can remember which elements those are, you don't have to escape <
signs (you optionally can though). XML doesn't have that, so there's nothing to remember and all elements have the same content type.
XML has processor instructions, the most well known of which is the xml declaration in the prolog, <?xml version="1.0" encoding="windows-1252"?>
. This tells the browser which version of XML is used (1.0 is the only version that works, by the way) and which character set.
And XML parses comments in a different way. For example, HTML comments can't start with <!-->
(with a >
as the first character inside); XHTML comments can.
Speaking of comments, with XHTML you can comment out blocks of code inside <script>
and <style>
elements using <!-- comment -->
. Don't try that in HTML. (It's not recommended in XHTML either, because of compatibility issues, but you can.)
Why are there different versions of XHTML if they all act the same?
They don't! For instance, in XHTML 1.1 you can refer to character entities like é
and
, because those entities are defined in the DTD. The current version of XHTML (formerly known as XHTML5) does not have a DTD, so you will have to use numerical references, in this case é
and  
(or, define those entities yourself in the DOCTYPE declaration. The X means eXtensible after all).