I\'m generating an XML document from a PHP script and I need to escape the XML special characters. I know the list of characters that should be escaped; but what is the corr
Proper escaping is the way to get correct XML output but you need to handle escaping differently for attributes and elements. (That is Tomas' answer is incorrect).
I wrote/stole some Java code a while back that differentiates between attribute and element escaping. The reason is that the XML parser considers all white space special particularly in attributes.
It should be trivial to port that over to PHP (you can use Tomas Jancik's approach with the above appropriate escaping). You don't have to worry about escaping extended entities if your using UTF-8
.
If you don't want to port my Java code you can look at XMLWriter which is stream based and uses libxml so it should be very efficient.
Use the DOM classes to generate your whole XML document. It will handle encodings and decodings that we don't even want to care about.
Edit: This was criticized by @Tchalvak:
The DOM object creates a full XML document, it doesn't easily lend itself to just encoding a string on it's own.
Which is wrong, DOMDocument can properly output just a fragment not the whole document:
$doc->saveXML($fragment);
which gives:
Test & <b> and encode </b> :)
Test &amp; <b> and encode </b> :)
as in:
$doc = new DOMDocument();
$fragment = $doc->createDocumentFragment();
// adding XML verbatim:
$xml = "Test & <b> and encode </b> :)\n";
$fragment->appendXML($xml);
// adding text:
$text = $xml;
$fragment->appendChild($doc->createTextNode($text));
// output the result
echo $doc->saveXML($fragment);
See Demo
What about the htmlspecialchars() function?
htmlspecialchars($input, ENT_QUOTES | ENT_XML1, $encoding);
Note: the ENT_XML1
flag is only available if you have PHP 5.4.0 or higher.
htmlspecialchars()
with these parameters replaces the following characters:
&
(ampersand) becomes &
"
(double quote) becomes "
'
(single quote) becomes '
<
(less than) becomes <
>
(greater than) becomes >
You can get the translation table by using the get_html_translation_table() function.
function replace_char($arr1)
{
$arr[]=preg_replace('>','>', $arr1);
$arr[]=preg_replace('<','<', $arr1);
$arr[]=preg_replace('"','"', $arr1);
$arr[]=preg_replace('\'','&apos', $arr1);
$arr[]=preg_replace('&','&', $arr1);
return $arr;
}