问题
I'm using QueryPath to manipulate a pages DOM. The page I'm manipulating has some tags that QueryPath doesn't know how to interpret.
I've tried passing the following as options but I still get errors:
ignore_parser_warnings
use_parser (html)
I get the following errors with these enabled:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nobr invalid in Entity
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity
Any help would be greatly appreciated.
回答1:
Try the libxml functions
libxml_use_internal_errors(TRUE);
$dom->load('whatever'); // or whatever you use for loading the DOM
libxml_clear_errors();
Instead of just clearing the erros, you can opt to handle them, though the above should be sufficient for most cases.
回答2:
Use htmlqp()
instead of qp()
. The htmlqp()
function does a substantial amount of fixing for yucky HTML.
回答3:
Just use an @ in front of your QueryPath functions to suppress the warnings. While invalid HTML may generate warnings, it can generally handle it just fine.
来源:https://stackoverflow.com/questions/3987239/querypath-and-malformed-html