At work we are being asked to create XML files to pass data to another offline application that will then create a second XML file to pass back in order to update some of ou
Just a couple of corrections to some bad info:
@John Ballinger: Attributies can contain any character data. < > & " ' need to be escaped to < > & " and ' , respectively. If you use an XML library, it will take care of that for you.
Hell, an attribute can contain binary data such as an image, if you really want, just by base64-encoding it and making it a data: URL.
@feenster: Attributes can contain space-separated multiple items in the case of IDS or NAMES, which would include numbers. Nitpicky, but this can end up saving space.
Using attributes can keep XML competitive with JSON. See Fat Markup: Trimming the Fat Markup Myth one calorie at a time.
the million dollar question!
first off, don't worry too much about performance now. you will be amazed at how quickly an optimized xml parser will rip through your xml. more importantly, what is your design for the future: as the XML evolves, how will you maintain loose coupling and interoperability?
more concretely, you can make the content model of an element more complex but it's harder to extend an attribute.
Attributes can easily become difficult to manage over time trust me. i always stay away from them personally. Elements are far more explicit and readable/usable by both parsers and users.
Only time i've ever used them was to define the file extension of an asset url:
<image type="gif">wank.jpg</image> ...etc etc
i guess if you know 100% the attribute will not need to be expanded you could use them, but how many times do you know that.
<image>
<url>wank.jpg</url>
<fileType>gif</fileType>
</image>
It is arguable either way, but your colleagues are right in the sense that the XML should be used for "markup" or meta-data around the actual data. For your part, you are right in that it's sometimes hard to decide where the line between meta-data and data is when modeling your domain in XML. In practice, what I do is pretend that anything in the markup is hidden, and only the data outside the markup is readable. Does the document make some sense in that way?
XML is notoriously bulky. For transport and storage, compression is highly recommended if you can afford the processing power. XML compresses well, sometimes phenomenally well, because of its repetitiveness. I've had large files compress to less than 5% of their original size.
Another point to bolster your position is that while the other team is arguing about style (in that most XML tools will handle an all-attribute document just as easily as an all-#PCDATA document) you are arguing practicalities. While style can't be totally ignored, technical merits should carry more weight.
XML is all about agreement. First defer to any existing XML schemas or established conventions within your community or industry.
If you are truly in a situation to define your schema from the ground up, here are some general considerations that should inform the element vs attribute decision:
<versus>
<element attribute="Meta content">
Content
</element>
<element attribute="Flat">
<parent>
<child>Hierarchical</child>
</parent>
</element>
<element attribute="Unordered">
<ol>
<li>Has</li>
<li>order</li>
</ol>
</element>
<element attribute="Must copy to reuse">
Can reference to re-use
</element>
<element attribute="For software">
For humans
</element>
<element attribute="Extreme use leads to micro-parsing">
Extreme use leads to document bloat
</element>
<element attribute="Unique names">
Unique or non-unique names
</element>
<element attribute="SAX parse: read first">
SAX parse: read later
</element>
<element attribute="DTD: default value">
DTD: no default value
</element>
</versus>
Others have covered how to differentiate between attributes from elements but from a more general perspective putting everything in attributes because it makes the resulting XML smaller is wrong.
XML is not designed to be compact but to be portable and human readable. If you want to decrease the size of the data in transit then use something else (such as google's protocol buffers).