Find multiple patterns with a single preg_match_all in PHP

后端 未结 4 441
独厮守ぢ
独厮守ぢ 2021-01-01 04:49

Using PHP and preg_match_all I\'m trying to get all the HTML content between the following tags (and the tags also):

paragraph text

don\'t
相关标签:
4条回答
  • 2021-01-01 05:14

    If you are to use a DOM parser, and you should, here's how. A contributor posted a useful function for obtaining a DOMNode's innerHTML, which I will use in the following example:

    $dom = new DOMDocument;
    $dom->loadHTML($html);
    
    $p = $dom->getElementsByTagName('p')->item(0); // first <p> node
    $ul = $dom->getElementsByTagName('ul')->item(0); // first <ul> node
    $table = $dom->getElementsByTagName('table')->item(0); // first <table> node
    
    echo DOMinnerHTML($p);
    echo DOMinnerHTML($ul);
    echo DOMinnerHTML($table);
    
    0 讨论(0)
  • 2021-01-01 05:17

    While doable with regular expressions, you could simplify the task by using one of the simpler HTML parser toolkits. For example with phpQuery or QueryPath it's as simple as:

    qp($html)->find("p, ul, table")->text();   // or loop over them
    
    0 讨论(0)
  • 2021-01-01 05:24

    Use | to match one of a group of strings: p|ul|table

    Use backreferences to match the approriate closing tag: \\2 because the group (pl|ul|table) includes the second opening parenthesis

    Putting that all together:

    preg_match_all("(<(p|ul|table)>(.*)</\\2>)siU", $content, $matches, PREG_SET_ORDER);
    

    This is only going to work if your input html follows a very strict structure. It cannot have spaces in the tags, or have any attributes in tags. It also fails when there's any nesting. Consider using an html parser to do a proper job.

    0 讨论(0)
  • 2021-01-01 05:31

    This one work for me

    preg_match_all("#<\b(p|ul|table)\b[^>]*>(.*?)</\b(p|ul|table)\b>#si", $content, $matches)
    
    0 讨论(0)
提交回复
热议问题