How to extract html comments and all html contained by node?

前端 未结 4 597
猫巷女王i
猫巷女王i 2020-12-01 22:57

I\'m creating a little web app to help me manage and analyze the content of my websites, and cURL is my favorite new toy. I\'ve figured out how to extract info about all so

相关标签:
4条回答
  • 2020-12-01 23:25

    For the HTML comments a fast method is:

     function getComments ($html) {
    
         $rcomments = array();
         $comments = array();
    
         if (preg_match_all('#<\!--(.*?)-->#is', $html, $rcomments)) {
    
             foreach ($rcomments as $c) {
                 $comments[] = $c[1];
             }
    
             return $comments;
    
         } else {
             // No comments matchs
             return null;
         }
    
     }
    
    0 讨论(0)
  • 2020-12-01 23:26

    for comments your looking for recursive regex. For instance, to get rid of html comments:

    preg_replace('/<!--(?(?=<!--)(?R)|.)*?-->/s',$yourHTML);
    

    to find them:

    preg_match_all('/(<!--(?(?=<!--)(?R)|.)*?-->)/s',$yourHTML,$comments);
    
    0 讨论(0)
  • 2020-12-01 23:30

    Comment nodes should be easy to find in XPath with the comment() test, analogous to the text() test:

    $comments = $xpath->query('//comment()'); // or another path, as you prefer
    

    They are standard nodes: here is the manual entry for the DOMComment class.


    To your other question, it's a bit trickier. The simplest way is to use saveXML() with its optional $node argument:

    $html = $dom->saveXML($el);  // $el should be the element you want to get 
                                 // the HTML for
    
    0 讨论(0)
  • 2020-12-01 23:36

    That Regex \s*<!--[\s\S]+?-->
    Helps to you.

    In regex Test

    0 讨论(0)
提交回复
热议问题