How to extract blocks of text from a HTML page?

爱⌒轻易说出口 提交于 2019-12-25 01:49:22

问题


I would like to extract blocks of texts with more than 100 words from a large HTML page using PHP. Whether the text is contained in <p>...</p> doesn't matter. I only care about the number of words that makes a coherent text block so texts outside of HTML paragraphs should also be taken into consideration.

How can this be done?


回答1:


I use phpQuery. Are you familiar with jQuery? they share the same syntax. You might be concerned about installing a new library, but trust me this library is well worth the extra over head

phpQuery

You can then access it like this:

foreach($doc->find('p') as $element){
   $element = pq($element);
   echo str_word_count($element->text());
}



回答2:


Use the PHP Simple DOM Parser.

foreach($html->find('p') as $element){
   echo str_word_count($element->src);
}


来源:https://stackoverflow.com/questions/5239539/how-to-extract-blocks-of-text-from-a-html-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!