What\'s the accepted way of storing quoted data in XML?
For example, for a node, which is correct?
For example, for a node, which is correct?
The XML specification itself doesn't talk about nodes (other than when comparing DTD syntax to finite automaton regex). A DOM node can be attribute, element, text or any of the other node types.
Inside a text node, you only need to escape characters which the parser would interpret as starting a different node - so you escape & and < as & and < .
For portability, it's often a good idea to escape curly quotes, but there is no reason to escape plain quotes in XML text.
Inside an attribute node, you have to escape less-than and ampersand as before, and also whichever quote you used to delimit the attribute.
It's usually easier to get in the habit of only using one type and always escaping it. I write quite a bit of XSLT and favour using " outside and ' inside:
If you get paranoid with the escaping, the XPath becomes less readable:
If (c), is it really appropriate to mix HTML & XML?
XML defines the named entities amp, gt, lt, apos, & quot
HTML defines many more entities.
You can and should use the XML named entities in XML in preference of using a numeric entity.
The lt entity escapes < and should be used in text and attribute values. The amp entity escapes & and should be used in text and attribute values. The apos and quot entities escape ' and " and should be used in attribute values. The gt entity is a bit useless - there is almost never a syntactic requirement to escape > in XML. Maybe > only agreed to work with < if it got equal billing.
The other one I use a lot in XSLT that generates source code is which inserts a new line. &nl; would have been more use than >
Similarly, how do you handle single and curly quotes?
XML is designed to mark up Unicode text, and the curly quotes have no special meaning in it. However, it's not uncommon for the encoding used for and XML document to be misinterpreted in the wild. So if it's in a closed environment and can guarantee correct Unicode encoding at producer and consumer then I'd just put it in the XML. Otherwise use a numeric character entity. That's true of any character with a code-point above 127 - there's nothing special about curly quotes.