Parsing HTML in Cakephp

问题

I started building a web crawler in CakePHP 2.2. The pages, the script is crawling is HTML pages, and I need to parse them, to get my values.

Have tried some different solutions, and looked on some open source things aswell, but not sure what the best way is to do this.

DomDocument::loadHTML() - Looks like this is the solution but not 100% sure.
Regular Expression - A bit hard to maintain
Simple HTMLDom - http://electrokami.com/coding/simple-html-dom-baked-cakephp-component (Made for Cake 1.3, and the code it self, yeah I don't like it - and got serious memory leak(s))

To figure out, which method I should use, I need your help.

回答1:

DomDocument is your best choice. There are some decent examples in php.net documentation for this module. If you can use other language such as ruby I have very good experience with hpricot, a jQuery like library for parsing html.

This question is related to Robust and Mature HTML Parser for PHP

来源：https://stackoverflow.com/questions/11623377/parsing-html-in-cakephp

标签

html

parsing

web-crawler

php-5.3

cakephp-2.2

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!