Using PHP and preg_match_all I\'m trying to get all the HTML content between the following tags (and the tags also):
paragraph text
don\'t
If you are to use a DOM parser, and you should, here's how. A contributor posted a useful function for obtaining a DOMNode's innerHTML, which I will use in the following example:
$dom = new DOMDocument;
$dom->loadHTML($html);
$p = $dom->getElementsByTagName('p')->item(0); // first <p> node
$ul = $dom->getElementsByTagName('ul')->item(0); // first <ul> node
$table = $dom->getElementsByTagName('table')->item(0); // first <table> node
echo DOMinnerHTML($p);
echo DOMinnerHTML($ul);
echo DOMinnerHTML($table);
While doable with regular expressions, you could simplify the task by using one of the simpler HTML parser toolkits. For example with phpQuery or QueryPath it's as simple as:
qp($html)->find("p, ul, table")->text(); // or loop over them
Use |
to match one of a group of strings: p|ul|table
Use backreferences to match the approriate closing tag: \\2
because the group (pl|ul|table)
includes the second opening parenthesis
Putting that all together:
preg_match_all("(<(p|ul|table)>(.*)</\\2>)siU", $content, $matches, PREG_SET_ORDER);
This is only going to work if your input html follows a very strict structure. It cannot have spaces in the tags, or have any attributes in tags. It also fails when there's any nesting. Consider using an html parser to do a proper job.
This one work for me
preg_match_all("#<\b(p|ul|table)\b[^>]*>(.*?)</\b(p|ul|table)\b>#si", $content, $matches)