The fundamental difference, which no other answer seems to have mentioned, is that XML is a markup language (as it actually says in its name), whereas JSON is a way of representing objects (as also noted in its name).
A markup language is a way of adding extra information to free-flowing plain text, e.g
Here is some text.
With XML (using a certain element vocabulary) you can put:
<Document>
<Paragraph Align="Center">
Here <Bold>is</Bold> some text.
</Paragraph>
</Document>
This is what makes markup languages so useful for representing documents.
An object notation like JSON is not as flexible. But this is usually a good thing. When you're representing objects, you simply don't need the extra flexibility. To represent the above example in JSON, you'd actually have to solve some problems manually that XML solves for you.
{
"Paragraphs": [
{
"align": "center",
"content": [
"Here ", {
"style" : "bold",
"content": [ "is" ]
},
" some text."
]
}
]
}
It's not as nice as the XML, and the reason is that we're trying to do markup with an object notation. So we have to invent a way to scatter snippets of plain text around our objects, using "content" arrays that can hold a mixture of strings and nested objects.
On the other hand, if you have typical a hierarchy of objects and you want to represent them in a stream, JSON is better suited to this task than HTML.
{
"firstName": "Homer",
"lastName": "Simpson",
"relatives": [ "Grandpa", "Marge", "The Boy", "Lisa", "I think that's all of them" ]
}
Here's the logically equivalent XML:
<Person>
<FirstName>Homer</FirstName>
<LastName>Simpsons</LastName>
<Relatives>
<Relative>Grandpa</Relative>
<Relative>Marge</Relative>
<Relative>The Boy</Relative>
<Relative>Lisa</Relative>
<Relative>I think that's all of them</Relative>
</Relatives>
</Person>
JSON looks more like the data structures we declare in programming languages. Also it has less redundant repetition of names.
But most importantly of all, it has a defined way of distinguishing between a "record" (items unordered, identified by names) and a "list" (items ordered, identified by position). An object notation is practically useless without such a distinction. And XML has no such distinction! In my XML example <Person>
is a record and <Relatives>
is a list, but they are not identified as such by the syntax.
Instead, XML has "elements" versus "attributes". This looks like the same kind of distinction, but it's not, because attributes can only have string values. They cannot be nested objects. So I couldn't have applied this idea to <Person>
, because I shouldn't have to turn <Relatives>
into a single string.
By using an external schema, or extra user-defined attributes, you can formalise a distinction between lists and records in XML. The advantage of JSON is that the low-level syntax has that distinction built into it, so it's very succinct and universal. This means that JSON is more "self describing" by default, which is an important goal of both formats.
So JSON should be the first choice for object notation, where XML's sweet spot is document markup.
Unfortunately for XML, we already have HTML as the world's number one rich text markup language. An attempt was made to reformulate HTML in terms of XML, but there isn't much advantage in this.
So XML should (in my opinion) have been a pretty limited niche technology, best suited only for inventing your own rich text markup languages if you don't want to use HTML for some reason. The problem was that in 1998 there was still a lot of hype about the Web, and XML became popular due to its superficial resemblance to HTML. It was a strange design choice to try to apply to hierarchical data a syntax actually designed for convenient markup.