Querypath and Malformed HTML

一世执手 提交于 2019-12-13 16:11:51

问题


I'm using QueryPath to manipulate a pages DOM. The page I'm manipulating has some tags that QueryPath doesn't know how to interpret.

I've tried passing the following as options but I still get errors:

ignore_parser_warnings
use_parser (html)

I get the following errors with these enabled:

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nobr invalid in Entity

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity

Any help would be greatly appreciated.


回答1:


Try the libxml functions

libxml_use_internal_errors(TRUE);
$dom->load('whatever'); // or whatever you use for loading the DOM
libxml_clear_errors();

Instead of just clearing the erros, you can opt to handle them, though the above should be sufficient for most cases.




回答2:


Use htmlqp() instead of qp(). The htmlqp() function does a substantial amount of fixing for yucky HTML.




回答3:


Just use an @ in front of your QueryPath functions to suppress the warnings. While invalid HTML may generate warnings, it can generally handle it just fine.



来源:https://stackoverflow.com/questions/3987239/querypath-and-malformed-html

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!