问题
Been happily saving things to my XML files via a web form that is pared by PHP and the SimpleDOM.php
I need to save items that have English pricing in them so I need the English pound sign. However when I do this 2 things happen:
it returns the saved price as £
if I then save it again without any other changes the SimpleDom parser then barfs and it removes any other content inthe XML file beyond the english pound sign.
The top line in my XML file looks like
<?xml version="1.0" encoding="ISO-8859-1"?>
Inside the XML file the £ is being saved as
&Acirc;&pound;
As far as I can tell iso-8859-1 should have the £ sign in it, so very confused why this Acirc is coming into it....
I saw on another thread someone said try using 8859-15 but that didnt make any difference.
Any ideas folks?
Cheers Jas (complete nube to all this encoding stuff)
回答1:
The unicode code point for £
is U+00A3. In the UTF-8 encoding it is 0xC2 0xA3
. Now, in ISO-8859-1 0xC2 is Å, and 0xA3 is £. So, somewhere in the flow, what you enter becomes UTF-8 which is interpreted as ISO-8859-1. Have you looked at how the "form" encodes the data before reaching your PHP code.
And, besides, what is this SimpleDOM doing w.r.t. entities? Â and £ are not valid XML entities without a declaration. Does SimpleDOM add the declarations?
回答2:
Forty-two's response defintely fixed one of the problems... I was putting encoding=iso-8859-1 in the xml doc but using utf-8 in the html meta content-type tag.
One other thing to note if anyone comes across this answer. I was also having brutal problems with the curved quote from a Windows document (copying text from Word 2007 into html form field on my site). There is a BIG difference between a curved quote and an apostrophe. On English keyboards Word interprets the upper-dash (an apostrophe) as a single curved quote. ISO-8859-1 does not have such an entity (its coded in the Windows-1252 "standard"). This was killing my XML documents as they were parsed by PHP from the form field. The solution was simple:
$var = htmlentities($var,ENT_QUOTES, "Windows-1252");
Other people have alluded to htmlentites and striptags... but it took me 4 half days to pull all this together. Hopefully save someone some time.
来源:https://stackoverflow.com/questions/7349176/%c2%a3-becomes-%c3%82%c2%a3-why-xml-iso-encoding-issue