Reading in Malformed XML (unencoded XML entities) with PHP

有些话、适合烂在心里 提交于 2019-11-28 05:27:18

问题


I'm having some trouble parsing malformed XML in PHP. In particular I'm querying a third party webservice that returns data in an XML format without encoding the XML entities in actual data. For example one of the the elements contains an ASCII heart, '<3', without the quotes, which the XML parser sees as an opening tag. It should be '&lt;3'.

Right now I'm simply passing the XML string into a SimpleXMLElement which, predictably, fails on these instances. I've done some looking around and it seems like PHP Tidy package might be able to help me, but the amount of configuration you can do is overwhelming :(

Thus, I'm just wondering if anyone else has had a problem like this and, if so, how they were able to solve it.

Thanks!


回答1:


Try tidy.repairString:

php > $tidy = new tidy();
php > $repaired = $tidy->repairString("<foo>I <3 Philadelphia</foo>", array("input-xml"=>1));
php > print($repaired);
<foo>I &lt;3 Philadelphia</foo>
php > $el = new SimpleXMLElement($repaired);



回答2:


  1. Read the content as a string.
  2. htmlspecialchars(preg_replace('/[\x-\x8\xb-\xc\xe-\x1f]/','',$string))
  3. Load the transformed string in SimpleXMLElement

It worked for me so far.



来源:https://stackoverflow.com/questions/1045837/reading-in-malformed-xml-unencoded-xml-entities-with-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!